
For path-attribution methods, the interpretation is always done with respect to the baseline: The difference between classification scores of the actual image and the baseline image are attributed to the pixels. Examples are SHAP and Integrated Gradients. Some path-attribution methods are "complete", meaning that the sum of the relevance scores for all input features is the difference between the prediction of the image and the prediction of a reference image.

This category includes model-specific gradient-based methods such as Deep Taylor and Integrated Gradients, as well as model-agnostic methods such as LIME and SHAP. The baseline image can also be multiple images: a distribution of images. The difference in actual and baseline prediction is divided among the pixels. Path-attribution methods compare the current image to a reference image, which can be an artificial "zero" image such as a completely grey image. The larger the absolute value of the gradient, the stronger the effect of a change of this pixel.

The interpretation of the gradient-only attribution is: If I were to increase the color values of the pixel, the predicted class probability would go up (for positive gradient) or down (for negative gradient). Examples are Vanilla Gradient and Grad-CAM. Gradient-only methods tell us whether a change in a pixel would change the prediction.

The gradient-based methods (of which there are many) mostly differ in how the gradient is computed.īoth approaches have in common that the explanation has the same size as the input image (or at least can be meaningfully projected onto it) and they assign each pixel a value that can be interpreted as the relevance of the pixel to the prediction or classification of that image.Īnother useful categorization for pixel attribution methods is the baseline question: Gradient-based: Many methods compute the gradient of the prediction (or classification score) with respect to the input features. Occlusion- or perturbation-based: Methods like SHAP and LIME manipulate parts of the image to generate explanations (model-agnostic). It helps to understand that there are two different types of attribution methods: There is a confusing amount of pixel attribution approaches. The c indicates the relevance for the c-th output \(S_C(I)\). ) with p features and output as explanation a relevance score for each of the p input features: \(R^c=\). All these methods take as input \(x\in\mathbb^p\) (can be image pixels, tabular data, words. The output of the neural network for image I is called \(S(I)=\). We consider neural networks that output as prediction a vector of length \(C\), which includes regression where \(C=1\). SHAP, Shapley values and LIME are examples of general feature attribution methods. The features can be input pixels, tabular data or words. Feature attribution explains individual predictions by attributing each input feature according to how much it changed the prediction (negatively or positively). Pixel attribution is a special case of feature attribution, but for images. Pixel attribution methods can be found under various names: sensitivity map, saliency map, pixel attribution map, gradient-based attribution methods, feature relevance, feature attribution, and feature contribution. You will see later in this chapter what is going on in this particular image.

8.4.5 Generalized Functional ANOVA for Dependent Features.8.4.3 How not to Compute the Components II.8.4.1 How not to Compute the Components I.8.2 Accumulated Local Effects (ALE) Plot.5.5.1 Learn Rules from a Single Feature (OneR).5.2.1 What is Wrong with Linear Regression for Classification?.5.1.6 Do Linear Models Create Good Explanations?.4.3 Risk Factors for Cervical Cancer (Classification).4.2 YouTube Spam Comments (Text Classification).3.3.5 Local Interpretability for a Group of Predictions.3.3.4 Local Interpretability for a Single Prediction.3.3.3 Global Model Interpretability on a Modular Level.3.3.2 Global, Holistic Model Interpretability.3.2 Taxonomy of Interpretability Methods.
