Techno Blender
Digitally Yours.

Explainable Boosting Machine: Bridging the Gap between ML and Explainability | by diksha tiwari | Oct, 2022

0 48


What is explainability and why is it important?

Photo by Andrea De Santis on Unsplash

What is explainability and why is it important?

AI has evolved over the last couple of decades and has found applicability in various domains, including critical systems such as healthcare, the military, banking, and self-driving cars. However, more often than not, the black-box nature of many machine learning models hinders their adoption. For instance, in business, an application of machine learning would be forecasting future sales, predicting customer churn, targeting customers, etc. A black-box model might be able to accurately predict the future sales or customer churn, but it won’t be able to explain how different factors are impacting the output. This is especially important in businesses such as banking that are governed by regulations that require their models to be transparent and structurally unbiased. In addition, an understanding of a model’s decisions helps businesses appreciate how different factors are impacting their business, better prepare for an uncertain future, and make better strategic decisions. This is afforded to them by explainable or glass-box ML models.

Explainability in machine learning refers to the ability of a model or modeling technique to unravel the relationships within the model. Based on the level of explainability offered by algorithms, machine learning models can be broadly classified into two categories: glass-box models and black-box models. This article explores what glass-box and black-box models are, what renders a glass-box model’s explainability, and a type of glass-box model called Explainable Boosting Machine (EBM).

Glass box models:

These are the models that are inherently explainable. Two of the simplest models that fall into this category are linear and logistic regression. Take the example of logistic regression, which predicts the probability as a function of a linear combination of one or more independent variables. The equation for this model can be written as:

image by author

Where p is the probability of occurrence of the event, and log(p/(1-p)) is the log odds of the event. This ability to express log odds as a linear combination of independent variables is what lends this model its explainability. For this logistic regression model, one can say that a unit change in variable x1 changes the log odds of the event by 1. Note that, the dependent variable does not need to have a linear relationship with the independent variable for explainability. For instance, even a “generalized additive model” of the form

image by author

is explainable as you can estimate how a change in any variable x is impacting the target y. (Spoilers: this is what EBM leverages)

Black-box models:

These are the models in which the relationship between the independent variables and the dependent variable is not apparent. Algorithms such as random forest, gradient boosted trees, and neural networks are some of the black-box models.

To illustrate the idea of black-box further, consider the example of a simple regression tree, which has the following mathematical form:

image by author

Where, ki are constants, I(.) is an indicator function that returns 1 if its argument is true and 0 otherwise.

and Di are disjoint partitions of training data. Below example illustrates this equation:

image by author

Using the more concise representation of above equation we obtain:

image by author

However, glass-box models, although interpretable, are often less accurate when compared to black-box models. EBM tries to bridge this gap by offering a solution that is as accurate as some of the black-box models while remaining fairly interpretable.

Explainable Boosting Machine (EBM)

What is EBM?

EBM is a generalized additive model that uses gradient boosting with an ensemble of shallow regression trees [1]. Thus, simply stated EBM is a generalized function of the form

image by author

Where g(.) is a link function (like in generalized linear models). The model trains one feature at a time in a “round-robin cycle” using a very low learning rate so that the feature order does not matter [1].

Thus, in iteration 1:

image by author

in iteration 2:

image by author

This goes on till iteration r. The final function for each feature is obtained by adding all the functions for that feature i.e.

image by author

and so on.

For each feature, EBM calculates a f(xi) vs xi table which it uses to produce the score vs xi graphs that helps in understanding the relationship between xi and yi as well as contribution of each feature to the prediction of yi. However, EBM doesn’t stop here. It also includes two-dimensional interactions between the variables. Since two-dimensional interactions can still be rendered as heat-maps on a two-dimensional plane, the model that includes two-dimensional interaction is still interpretable. Thus, final form of EBM can be represented as:

image by author

Traditionally, identifying the interaction terms would be complex in terms of the computing power required to do so, especially for large datasets with large numbers of variables. EBM solves this problem by proposing a two-stage construction approach and using FAST to efficiently rank the pairwise interactions. The two stages in this approach are:

1. In stage 1, build the best additive model using only one-dimensional components.

2. In stage 2, fix the one-dimensional functions and build models for pairwise interactions on residuals i.e., select the top-K interaction pairs using FAST, and fit a model using the pairs on the residual R, where K is chosen according to computing power. [2]

Since EBM calculates the final output by adding individual contributions of each feature, it is easy to visualize and understand the contributions made by individual features and interaction terms. However, due to this modularity of prediction, EBM has to pay an additional training cost making it somewhat slower than similar methods. But this does not make it any slower during prediction as making predictions involves simple additions and lookups inside of the feature functions. In fact, this makes EBMs one of the fastest models to execute at prediction [1].

EBM example:

In the below example, I have used the Credit Card Fraud Detection dataset from Kaggle [3]. This dataset is available to users to freely share, modify and use under the ODC ODBL license agreement.

Dataset description:

The dataset contains transactions made by credit cards in September 2013 by European cardholders. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains features from V1 till V28 which are the features obtained after PCA. The only features that are not transformed used using PCA are Time (which is the time elapsed in seconds between 1st and each transaction in the dataset), Amount (amount used in each transaction) and class (which is the target variable) [3].

The EBM model:

Since the data is already processed, we will get directly to the modeling part.

image by author

Once we have the model, let us have a look at how the model is behaving. EBM offers two kinds of explanations: global and local.

Global explanations:

Global explanations help us understand the overall contribution of features to the model and how each feature is related to the model.

  1. Understanding the overall contribution of features to the model
image by author

You can also get feature and feature importance by using ebm.feature_importances_ and ebm.feature_names

Feature (x) v/s target (y) relationship:

image by author

Local Explanations:

The local explanations help us understand what is happening at each prediction i.e., at local level.

image by author
image by author

Get predictions from EBM:

Since the dataset is highly imbalanced, we use the metric ‘area under the precision recall curve’ to test the model.

image by author

Let us compare how EBM fairs against xg-boost:

image by author

Pretty close huh!

References:

[1] https://interpret.ml/docs/ebm.html

[2] Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. 2013.

[3] https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud


What is explainability and why is it important?

Photo by Andrea De Santis on Unsplash

What is explainability and why is it important?

AI has evolved over the last couple of decades and has found applicability in various domains, including critical systems such as healthcare, the military, banking, and self-driving cars. However, more often than not, the black-box nature of many machine learning models hinders their adoption. For instance, in business, an application of machine learning would be forecasting future sales, predicting customer churn, targeting customers, etc. A black-box model might be able to accurately predict the future sales or customer churn, but it won’t be able to explain how different factors are impacting the output. This is especially important in businesses such as banking that are governed by regulations that require their models to be transparent and structurally unbiased. In addition, an understanding of a model’s decisions helps businesses appreciate how different factors are impacting their business, better prepare for an uncertain future, and make better strategic decisions. This is afforded to them by explainable or glass-box ML models.

Explainability in machine learning refers to the ability of a model or modeling technique to unravel the relationships within the model. Based on the level of explainability offered by algorithms, machine learning models can be broadly classified into two categories: glass-box models and black-box models. This article explores what glass-box and black-box models are, what renders a glass-box model’s explainability, and a type of glass-box model called Explainable Boosting Machine (EBM).

Glass box models:

These are the models that are inherently explainable. Two of the simplest models that fall into this category are linear and logistic regression. Take the example of logistic regression, which predicts the probability as a function of a linear combination of one or more independent variables. The equation for this model can be written as:

image by author

Where p is the probability of occurrence of the event, and log(p/(1-p)) is the log odds of the event. This ability to express log odds as a linear combination of independent variables is what lends this model its explainability. For this logistic regression model, one can say that a unit change in variable x1 changes the log odds of the event by 1. Note that, the dependent variable does not need to have a linear relationship with the independent variable for explainability. For instance, even a “generalized additive model” of the form

image by author

is explainable as you can estimate how a change in any variable x is impacting the target y. (Spoilers: this is what EBM leverages)

Black-box models:

These are the models in which the relationship between the independent variables and the dependent variable is not apparent. Algorithms such as random forest, gradient boosted trees, and neural networks are some of the black-box models.

To illustrate the idea of black-box further, consider the example of a simple regression tree, which has the following mathematical form:

image by author

Where, ki are constants, I(.) is an indicator function that returns 1 if its argument is true and 0 otherwise.

and Di are disjoint partitions of training data. Below example illustrates this equation:

image by author

Using the more concise representation of above equation we obtain:

image by author

However, glass-box models, although interpretable, are often less accurate when compared to black-box models. EBM tries to bridge this gap by offering a solution that is as accurate as some of the black-box models while remaining fairly interpretable.

Explainable Boosting Machine (EBM)

What is EBM?

EBM is a generalized additive model that uses gradient boosting with an ensemble of shallow regression trees [1]. Thus, simply stated EBM is a generalized function of the form

image by author

Where g(.) is a link function (like in generalized linear models). The model trains one feature at a time in a “round-robin cycle” using a very low learning rate so that the feature order does not matter [1].

Thus, in iteration 1:

image by author

in iteration 2:

image by author

This goes on till iteration r. The final function for each feature is obtained by adding all the functions for that feature i.e.

image by author

and so on.

For each feature, EBM calculates a f(xi) vs xi table which it uses to produce the score vs xi graphs that helps in understanding the relationship between xi and yi as well as contribution of each feature to the prediction of yi. However, EBM doesn’t stop here. It also includes two-dimensional interactions between the variables. Since two-dimensional interactions can still be rendered as heat-maps on a two-dimensional plane, the model that includes two-dimensional interaction is still interpretable. Thus, final form of EBM can be represented as:

image by author

Traditionally, identifying the interaction terms would be complex in terms of the computing power required to do so, especially for large datasets with large numbers of variables. EBM solves this problem by proposing a two-stage construction approach and using FAST to efficiently rank the pairwise interactions. The two stages in this approach are:

1. In stage 1, build the best additive model using only one-dimensional components.

2. In stage 2, fix the one-dimensional functions and build models for pairwise interactions on residuals i.e., select the top-K interaction pairs using FAST, and fit a model using the pairs on the residual R, where K is chosen according to computing power. [2]

Since EBM calculates the final output by adding individual contributions of each feature, it is easy to visualize and understand the contributions made by individual features and interaction terms. However, due to this modularity of prediction, EBM has to pay an additional training cost making it somewhat slower than similar methods. But this does not make it any slower during prediction as making predictions involves simple additions and lookups inside of the feature functions. In fact, this makes EBMs one of the fastest models to execute at prediction [1].

EBM example:

In the below example, I have used the Credit Card Fraud Detection dataset from Kaggle [3]. This dataset is available to users to freely share, modify and use under the ODC ODBL license agreement.

Dataset description:

The dataset contains transactions made by credit cards in September 2013 by European cardholders. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains features from V1 till V28 which are the features obtained after PCA. The only features that are not transformed used using PCA are Time (which is the time elapsed in seconds between 1st and each transaction in the dataset), Amount (amount used in each transaction) and class (which is the target variable) [3].

The EBM model:

Since the data is already processed, we will get directly to the modeling part.

image by author

Once we have the model, let us have a look at how the model is behaving. EBM offers two kinds of explanations: global and local.

Global explanations:

Global explanations help us understand the overall contribution of features to the model and how each feature is related to the model.

  1. Understanding the overall contribution of features to the model
image by author

You can also get feature and feature importance by using ebm.feature_importances_ and ebm.feature_names

Feature (x) v/s target (y) relationship:

image by author

Local Explanations:

The local explanations help us understand what is happening at each prediction i.e., at local level.

image by author
image by author

Get predictions from EBM:

Since the dataset is highly imbalanced, we use the metric ‘area under the precision recall curve’ to test the model.

image by author

Let us compare how EBM fairs against xg-boost:

image by author

Pretty close huh!

References:

[1] https://interpret.ml/docs/ebm.html

[2] Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. 2013.

[3] https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment