FIGS (Quick Interpretation Greedy Tree Sums): A method for building interpretable models by simultaneously growing a set of competing decision trees.
Recent advances in machine learning have resulted in increasingly complex predictive models, often at the expense of interpretability. We often need interpretability, especially in high-stakes applications such as clinical decision-making; Interpretable models help with all sorts of things, including identifying errors, leveraging domain knowledge, and making fast predictions.
In this blog post we’ll talk about FIGS, a new method for adjusting a interpretable model which takes the form of a sum of trees. Real-world experiments and theoretical results show that FIGS can be effectively adapted to a wide range of data structures, achieving state-of-the-art performance in various environments, all without sacrificing interpretability.
How does FIGS work?
Intuitively, FIGS works by extending CART, a typical greedy algorithm for growing a decision tree, to consider growing a sum of trees simultaneously (see Figure 1). At each iteration, FIGS can grow any existing tree that has already started or start a new tree; greedily selects the rule that most reduces the total unexplained variance (or an alternative splitting criterion). To keep the trees in sync with each other, each tree is made to predict the waste remaining after summing the predictions from all other trees (see the paper for details).
FIGS is intuitively similar to ensemble approaches such as gradient boosting / random forest, but importantly, because all trees are grown to compete with each other, the model can better fit the underlying structure of the data The number of trees and the size/shape of each tree emerge automatically from the data rather than being specified manually.
Fig 1. High-level intuition about how FIGS fits a model.
An example using
Using FIGS is extremely simple. It can be easily installed using the imodels package (
pip install imodels) and then can be used in the same way as the standard scikit-learn models: simply import a classifier or regressor and use the
predict methods Here is a complete example of using it on a sample clinical data set where the objective is the risk of cervical spine injury (CSI).
from imodels import FIGSClassifier, get_clean_dataset from sklearn.model_selection import train_test_split # prepare data (in this a sample clinical dataset) X, y, feat_names = get_clean_dataset('csi_pecarn_pred') X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42) # fit the model model = FIGSClassifier(max_rules=4) # initialize a model model.fit(X_train, y_train) # fit model preds = model.predict(X_test) # discrete predictions: shape is (n_test, 1) preds_proba = model.predict_proba(X_test) # predicted probabilities: shape is (n_test, n_classes) # visualize the model model.plot(feature_names=feat_names, filename='out.svg', dpi=300)
This results in a simple model: it contains only 4 divisions (since we specified that the model should not have more than 4 divisions (
max_rules=4). Predictions are made by dropping one sample for each tree, i adding up the risk adjustment values obtained from the resulting leaves of each tree. This model is extremely interpretable as a clinician can now (i) easily make predictions using the 4 relevant features and (ii) verify the model to ensure it matches their domain expertise. Note that this model is for illustrative purposes only and achieves an accuracy of ~84%.
Fig 2. Simple FIGS learned model for predicting cervical spine injury risk.
If we want a more flexible model, we can also remove the restriction on the number of rules (by changing the code to
model = FIGSClassifier()), resulting in a larger model (see Figure 3). Note that the number of trees and how balanced they are emerges from the structure of the data; only the total number of rules can be specified.
Fig 3. Slightly larger model learned from FIGS to predict risk of cervical spine injury.
How well does FIGS work?
In many cases where interpretability is desired, such as clinical decision rule modeling, FIGS is capable of state-of-the-art performance. For example, Figure 4 shows different datasets where FIGS achieves excellent performance, especially when limited to using very few total splits.
Fig 4. FIGS predicts well with very few splits.
Why does FIGS work well?
FIGS is motivated by the observation that single decision trees often have splits that repeat on different branches, which can occur when there is additive structure in the data. Having multiple trees helps avoid this by decoupling additive components into separate trees.
In general, interpretable modeling offers an alternative to the usual black-box modeling, and in many cases can offer massive improvements in efficiency and transparency without suffering a loss of performance.
This post is based on two articles: FIGS and G-FIGS – all code is available via the imodels package. This is a joint work with Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen and Aaron Kornblith.
At Ikaroa, we are confident in our ability to use the latest in machine learning technology to improve the accuracy and speed of predictive analytics. In particular, we are excited about the idea of attaining XGBoost-level performance while still maintaining the interpretability and speed of the CART algorithm. This is the concept proposed in the recent Berkeley Artificial Intelligence Research Blog post.
The idea of combining the XGBoost tree based boosting algorithm with CART (classification and regression tree) algorithms is an intriguing one. It allows for a model to have the flexibility of XGBoost, which utilizes machine learning algorithms to predict outcomes, with the interpretability of CART algorithms. This has made the XGBoost-level performance much easier to evaluate and interpret.
At Ikaroa, we understand the importance of properly understanding the models that are being used in predictive analytics. We are working hard to trial and test this method to see if it meets our performance targets. We have been experimenting with combining XGBoost and CART to see if it can effectively achieve the performance of XGBoost with the added interpretability of CART. If successful, this technology could revolutionize how predictive analytics are used in industries.
Ultimately, the combination of XGBoost and CART algorithms is an exciting prospect. At Ikaroa, we are working hard to investigate the feasibility of this combination and evaluate its performance. With the right combination of XGBoost and CART, we hope to attain XGBoost-level performance while still maintaining the interpretability and speed of the CART algorithm.