Defining Interpretable Features. A summary of the findings and developed… | by Nakul Upadhya | Jan, 2023

By Jessie Hobb On Jan 14, 2023

A summary of the findings and developed taxonomy developed by MIT researchers.

In February 2022, researchers at the Data to AI (DAI) group at MIT released a paper called “The Need for Interpretable Features: Motivation and Taxonomy” [1]. In this post, I aim to summarize some of the main points and contributions of these authors and discuss some of the potential implications and critiques of their work. I highly recommend reading the original paper if you find any of this intriguing. Additionally, if you’re new to Interpretable Machine Learning, I highly recommend Christopher Molnar’s free book [2]. While the definition of interpretability/explainability often changes in different publications [1], this book provides a strong foothold in understanding the field.

The core finding of the paper is that even with highly interpretable models like Linear Regression, non-interpretable features can result in impossible-to-understand explanations (ex. a weight of 4 on the feature x12 means nothing to most people). With this in mind, the paper contributes a categorization of stakeholders, real-world use cases for interpretable features, classification of various feature qualities, and possible interpretable feature transformations that help data scientists develop understandable features.

The first contribution of this paper is expanding the main user types who may benefit from ML explanations proposed by Preece et al. [3] as well as defining some of their interests. While Preece et al. proposed 4 main types of stakeholders, the authors of this paper expand that list to 5:

Developers: The ones who train, test, and deploy ML models and are interested in features to improve model performance.
Theorists: The ones who are interested in advancing ML theory and are interested in features to understand their impact on models’ inner workings.
Ethicists: The ones who are interested in the fairness of the models and are interested in features to ensure ethical uses of models.
Decision Makers: The ones who intake the result of models to complete tasks and decisions. They are not explicitly interested in features but need explanations to ensure their decisions are made with sound info.
Impacted Users: These are individuals impacted by the models and their use, but do not directly interact with the models unless it’s to understand the impact on themselves.

Each of the various users has different needs when it comes to feature engineering, and these needs often conflict with each other. While a decision maker may want the simplest features in the model for better interpretability, a developer may opt for complicated transformations that engineer a feature to be ultra-predictive.

Along with presenting stakeholders, the authors present 5 real-world domains in which they ran into roadblocks when attempting to explain their developed models.

Case Studies

Child Welfare

In this case study, the DAI team collaborated with social workers and scientists (serving as decision-makers and ethicists) to develop an explainable LASSO model with over 400 features that outputted a risk score for potential child abuse cases. During this process, the DAI team found that most of the distrust surrounding the model stemmed from the features rather than the ML algorithm. One prominent point of confusion was around the wording surrounding one-hot encoded categorical features (ex. role of child is sibling == False). Additionally, many of the social workers and scientists had concerns about features that they deemed to be unrelated to the predictive task at hand based on their subject matter expertise.

Education

In the domain of online education, the authors worked on adding interpretability to various decision tasks related to massively open online courses (ex. free courses on Coursera, edX, etc.). While working with various course developers and instructors, the authors found that the most useful features were ones that combined data to abstract concepts that have meaning for the user (such as combining work completion and interaction into a participationfeature). Along with this, the researchers found that stakeholders responded better when the data sources of these abstract concepts were easily traceable.

Cybersecurity

In the third domain, researchers worked to develop models to detect Domain Generation Algorithms to help security analysts respond to potential attacks. While many features were engineered to identify these attacks, the raw DNS logs that these features were built from were much more useful to users and the challenge the authors faced was how to trace feature values back to the relevant logs.

Medical Records

In the domain of healthcare, researchers worked with six clinicians to develop a model to predict complications after surgery. In this case study, the authors used SHAP values to explain feature contributions but quickly found that SHAP explanations alone were not enough. Continuing the trend from the cybersecurity domain, the authors found that features based on aggregation functions are not as interpretable as the original signal data.

Satellite Monitoring

In this case study, the authors aimed to visualize the results of time-series anomaly detection solutions and developed a tool along with six domain experts. The authors then ran two user studies to evaluate the tool both with domain experts and with general end-users using stock price data. In this exercise, the authors discovered that more transparency is needed around the imputation process and most questions were about which of the values were imputed versus real.

Lessons Learned

There were three key lessons from all of the cases:

Most attention in the literature is placed on selecting and engineering features to maximize model performance, but models that interface with human users and decision-makers need an interpretable feature space to be useful.
To be interpretable, a feature needs to have various properties (discussed later in the taxonomy).
While transformations that bring features to a model-ready state are important, there also needs to be a way to undo these transformations for interpretability.

The authors used the domains they worked in along with a large literature search to then develop a taxonomy of feature qualities that the identified users. The authors organize these qualities across 2 main qualities — model-readiness and interpretability — with some features sharing both qualities.

Model-ready properties make a feature work well in a model and are what developers, theorists, and ethicists focus on.

Interpretable properties are the ones that make a feature more understandable for users. These properties primarily benefit decision-makers, users, and ethicists.

Model-Ready Feature Properties

Predictive: The feature correlations with the prediction target. This does not imply a direct causal link however as a feature can be a confounding variable or a spurious correlation.
Model-Compatible: The feature is supported by the model architecture, but may not be predictive or useful.
Model-Ready: The feature is model-compatible and can help generate an accurate prediction. Model-ready features also include ones that have been transformed through methods like normalization and standardization.

Interpretable Feature Properties

Readable: The feature is written in plain text and users can understand what is referred to without looking at any code.
Human-Worded: The feature is both readable and described in a natural, human-friendly way. The authors found that stakeholders in the child welfare space particularly benefitted from this property.
Understandable: The feature refers to real-world metrics that the users understand. This property is heavily dependent on the users’ expertise but is usually features that have not undergone complex mathematical operations (ex. age is understandable, but log(humidity) may not be).

Both Model-Ready and Interpretable Properties

Meaningful: The feature is one that subject matter experts believe is related to the target variable. Some features may be predictive, but not meaningful due to spurious correlations. Similarly, some features may be meaningful, but not very predictive. However, it is good practice to try to mostly use meaningful features.
Abstract Concepts: The feature is calculated through some domain-expert-defined combination of original features and is often generic concepts (ex. participation and achievement).
Trackable: The feature can be associated accurately with the raw data they were calculated from.
Simulatable: The feature can be accurately recalculated from raw data if needed. All simulatable features are trackable, but not all trackable features are simulatable. For example, test grade over time` may be trackable (it came from raw test grades), but not simulatable as this could refer to average grades per month or year, or grade change.

Along with various properties of interpretable features, the authors also presented a few feature engineering methods and how they could potentially contribute to feature interpretability. While some data transformations to make features model-ready can also help with interpretability, this is not often the case. Interpretability transforms aim to help bridge this gap, but can often undo model-ready transforms. This may reduce the predictive ability of the model, but will introduce interpretable feature properties making it more trusted by decision-makers, users, and ethicists.

Converting to Categorical: When aiming to explain features, convert one-hot encoded variables back to their categorical form.
Semantic Binning: When binning numerical data, attempt to bin based on real-world distinctions instead of statistical distinctions. For example, it is more interpretable to bin age by child, young-adult, adult, and senior categories instead of binning by quartiles.
Flagged Imputation: If data imputation is used, an extra feature identifying the points containing imputed data can greatly increase trust in your models.
Aggregate Numeric Features: When many closely-related metrics are present in the data, it may be beneficial to aggregate them into a single feature to prevent data overload. For example, the authors found that summing up various physical and emotional abuse referrals into a single referral count metric helped decision-makers.
Modify Categorical Granularity: When many categories are related to each other, interpretability and performance can be improved by selecting the appropriate summarization of the variable (ex. summarizing the soil zones in the forest covertype dataset to the main 8 geological soil zones)
Converting to Abstract Concepts: Apply numerical aggregation and categorical granularity transformers to develop a hand-crafted formula to generate an abstract concept that subject matter experts can understand.
Reverse Scaling and Feature Engineering: If standardization, normalization, or mathematical transforms are applied, interpretability can be increased if these transforms are reversed before analyzing the features. For example, reporting the feature weight on age is more helpful than reporting the weight of sqrt(age).
Link to Raw Data: This transform extends reversing scaling and feature engineering. If possible, explicitly display how the engineered feature is calculated from raw data.

While this is not an exhaustive list of all the possible transforms, this does provide a great starting point for data scientists out there on some simple steps they can take to ensure that they have an interpretable feature space.

Figure 1: Summary of the feature taxonomy proposed by Zytek et al. [1] (Figure from Paper)

Reading this paper I did have some critiques. For one, while the authors developed various stakeholders, they never provided any examples of when impacted users would be different than decision-makers. While we can make some educated guesses (ex. students could be impacted users in the education case, and patients could be impacted users in the healthcare case), there is not a presented reason for how interpretable features help this group.

The authors themselves also presented some risks of interpretable features as well. In their example, a developer could maliciously include the race feature into the abstract concept of socioeconomic factors, effectively hiding that race was used as a predictor in their model. Additionally, the authors concede that many of the interpretability transformations proposed may reduce model performance. Some interpretable feature properties (like readability) are also not appropriate when data privacy is important.

Despite these criticisms, it is undeniable that Zytek et al.[1] provided a lot of information about what makes features interpretable, how to achieve interpretability, and why it is important in the first place. Additionally, the proposed transforms are relatively simple to implement, making them much more friendly to beginner data scientists. Their taxonomy is summarized in Figure 1 above and is probably an image most data scientists need to keep handy on their desks.

[1] A. Zytek, I. Arnaldo, D. Liu, L. Berti-Equille, K. Veeramachaneni. The Need for Interpretable Features: Motivation and Taxonomy (2022). SIGKDD Explorations.

[2] C. Molnar. Interpretable Machine Learning (2020). LeanPub

[3] A. Preece, D. Harborne, D. Braines, R. Tomsett, S. Chakraborty. Stakeholders in Explainable AI (2018). Artificial Intelligence in Government and Public Sector page 6.

[3] S. Lundberg, S.I. Lee. A Unified Approach to Interpreting Model Predictions (2017). Advances in Neural Information Processing volume 31 page 10.