Concept Learning: Making your network interpretable | by Lukas Huber | Aug, 2022

By Jessie Hobb On Aug 4, 2022

Photo by eberhard grossgasteiger on Unsplash

Over the last decade, neural networks have been showing superb performance across a large variety of datasets and problems. While metrics like accuracy and F1-score are often suitable to measure the model’s ability learn the underlying structure of the data, the model still performs like a black box. This fact often renders neural networks unusable for safety critical applications where one needs to know on which assumptions a predication was made.

Just imagine a radiologist using a program with a neural network backbone to assist him finding a disease on an X-ray image. With traditional methods, it would only output the name of the disease, without any measurement of confidence (For a description of how to output true confidence scores see my last article on neural network calibration). This barely helps the physician, since he has no clue about how the decision was made and which part of the image affected it. What he actually wants to receive is a list of abnormalities in the image that strongly indicate the predicted disease (e.g. a visible mass in the lung indicating a tumor).

This is where Concept Learning comes in handy. Instead of simply predicting a set of diseases for a given input, it also returns a number of concepts that led to this decision. In this article you what Concept Learning is and how you can implement it in your very own project.

In the introduction I already used the word concept. But what does it actually mean in this context? To understand this, just imagine how you distinguish different animals. You might use their fur color, weight, speed, number of legs and many more attributes to accomplish this. There you have it! These are all concepts which can be used to describe the prediction of a network. In literature, the word attribute is often used interchangeably with concept. Obviously, the concepts highly depend on the task at hand.

Explainable concepts can be harnessed in two different ways. Training methods aim to build an inherently explainable model which directly outputs the found concepts. Post-hoc methods, on the other side, require no special training process, but explain an already existing model. I will walk you trough both of them!

Training methods

This class of methods focuses one creating architectures that can either be directly analyzed by humans or output a set of extracted concepts for an image. As a consequence, these methods can only be used if you want to train a new model from scratch. Don’t worry if that still sounds a little hard to grasp. I will walk you trough a paper recently published.

Standard convolutional neural networks are known to behave like a feature extractor. While earlier filters in a CNN tend to extract low-level features like corners and edges, later ones extract high-level features like object parts. Theoretically, these filters are able to explain the internal reasoning globally (i.e. not for a single sample). However, the learned filters are likely to not be understandable by humans and fall short in explaining the internal reasoning for a single example. The latent space encoded by the fully connected layers of the neural network is further not interpretable at all.

Concept Bottleneck models

Figure 1: Architecture of a Concept Bottleneck Model [1]

This is where Concept Bottleneck Models [1] come in handy. They work on a standard CNN model (e.g. ResNet or VGG) but reshape one of its fully-connected layers to match a number of predefined concepts which need to be defined for the given dataset. This layer is then referred to as the bottleneck of the model. Each input image needs to be annotated by its class label and a set of concepts describing it. For the bird classification dataset used in the paper, the concepts are binary and describe properties like wing_color:black or head_color:yellow. To train the model, standard backpropagation is used on the class labels. In addition, a concept loss is used to ensure that a node corresponding to a concept is only activated if the concept is actually present in the input image. A concept node is considered to be activated if its Sigmoid confidence surpasses 50%.

Defined more mathematically, the model is split into two parts: A concept extractor c = g(x) which maps the input x to a set of predefined concepts c encoded by the bottleneck layer. It is followed by a small classification network y = f(g(x)) predicting the class label.

ProtoPNet

The ProtoPNet proposed by researchers from Duke university is yet another interesting architecture. It reasons about its prediction by dissecting the input image and comparing it to prototypical parts of the predicted class.

Post-hoc methods

If you want to analyze an already trained model or are restricted to a very specific architecture you might need to use post-hoc methods, which make few to no assumptions about the underlying model to inspect.

For an in-depth introduction to post-hoc methods, see part two of the concept learning series!

Cheers!

Training methods

[1] Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E., Kim, B., & Liang, P. (2020). Concept Bottleneck Models.

[2] Li, C., Zia, M. Z., Tran, Q.-H., Yu, X., Hager, G. D., & Chandraker, M. (2018). Deep Supervision with Intermediate Concepts.

[3] Wickramanayake, S., Hsu, W., & Lee, M. L. (2021). Comprehensible Convolutional Neural Networks via Guided Concept Learning.

[5] Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. K. (2019). This Looks Like That: Deep Learning for Interpretable Image Recognition.

[6] Nauta, M., van Bree, R., & Seifert, C. (2021). Neural Prototype Trees for Interpretable Fine-Grained Image Recognition.

Post-Hoc methods

[7] Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., & Sayres, R. (2018). Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV).

[8] Zhou, B., Sun, Y., Bau, D., & Torralba, A. (2018). Interpretable Basis Decomposition for Visual Explanation.

[9] Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network Dissection: Quantifying Interpretability of Deep Visual Representations.

[10] Fong, R., & Vedaldi, A. (2018). Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks.

[11] Ghorbani, A., Wexler, J., Zou, J., & Kim, B. (2019). Towards Automatic Concept-based Explanations.

[12] Fang, Z., Kuang, K., Lin, Y., Wu, F., & Yao, Y.-F. (2020). Concept-Based Explanation for Fine-Grained Images and Its Application in Infectious Keratitis Classification.

Photo by eberhard grossgasteiger on Unsplash

Training methods

Concept Bottleneck models

ProtoPNet

Post-hoc methods

For an in-depth introduction to post-hoc methods, see part two of the concept learning series!

Cheers!

Training methods

[1] Koh, P. W., Nguyen, T., Tang, Y. S., Mussmann, S., Pierson, E., Kim, B., & Liang, P. (2020). Concept Bottleneck Models.

[2] Li, C., Zia, M. Z., Tran, Q.-H., Yu, X., Hager, G. D., & Chandraker, M. (2018). Deep Supervision with Intermediate Concepts.

[3] Wickramanayake, S., Hsu, W., & Lee, M. L. (2021). Comprehensible Convolutional Neural Networks via Guided Concept Learning.

[5] Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. K. (2019). This Looks Like That: Deep Learning for Interpretable Image Recognition.

[6] Nauta, M., van Bree, R., & Seifert, C. (2021). Neural Prototype Trees for Interpretable Fine-Grained Image Recognition.

Post-Hoc methods

[8] Zhou, B., Sun, Y., Bau, D., & Torralba, A. (2018). Interpretable Basis Decomposition for Visual Explanation.

[9] Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network Dissection: Quantifying Interpretability of Deep Visual Representations.

[10] Fong, R., & Vedaldi, A. (2018). Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks.

[11] Ghorbani, A., Wexler, J., Zou, J., & Kim, B. (2019). Towards Automatic Concept-based Explanations.

[12] Fang, Z., Kuang, K., Lin, Y., Wu, F., & Yao, Y.-F. (2020). Concept-Based Explanation for Fine-Grained Images and Its Application in Infectious Keratitis Classification.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.