Grad-CAM is a method dating back to October 2016 that helps explain why Convolutional Neural Network (CNN) based models perform as they do. This revolutionary technology defies the efficiency/explainability trade-off and allows the developer to verify what feature the model pays attention to.


When classifying, CNN-based models like VGG perform a series of convolutions operation, that is, it hovers a number of filters (or kernels) over the entire input image and multiply each pixel value by each value in the kernel to produce a (usually) smaller image.

This operation is repeated with different filters for each convolution layer in the model and then the convolved image is pooled and normalized to obtain an n-dimensional vector called y. This vector is then fed to a fully connected neural network which task is to point out what class (dog, cat, lama etc.) this is matching to.

Architecture of VGG16

Let’s say we want to know what the convolution layer A “looks at” in the input image when it classifies it as belonging to class c.

Exploring the gradient

First we need to know what impact each of the d filters in A have on our class prediction y^c. To do that, we compute the gradient of y^c with respect to each component in the k-th filter of our convolutional layer (A^k). For the filter component at (i,j), the gradient is:

\dfrac{\partial y^c}{\partial A^{k}_{i,j}}

This could be represented as the sentence

” What does a small change in the (i, j) component of my k-th filter does on my prediction y^c ? “

How much each filter matter

That is therefore a direct measurement of the role of each component feature in the k-th feature map of A. We can average these mesurements to get the importance of the k-th feature map by doing a Global Average Pooling (i.e. an average over (i, j) ).
We therefore get the scalar \alpha_k^c:

\alpha_k^c = \frac{1}{n\times m}\sum^n_{i=1}\sum^m_{j=1}\dfrac{\partial y^c}{\partial A^{k}_{i,j}}

Displaying their importance

Then, the heatmap indicating where the network “looks” (called L_{GradCAM}^c, for localisation map) can be obtain by weighing each k-th feature map’s values by these \alpha_k^c coefficients and passing them through a ReLU activation function (to get rid of negative values):

L_{GradCAM}^c = ReLU\left(\sum^d_{k=1}\alpha_k^c A^k\right)

We can then simply scale up that map of importance thanks to a rather nice property of convolutional networks: filters work locally. This means that the location of detected features is conserved through each layer of our CNN.

After scaling up and superimposing L_{GradCAM}^c to our initial picture, we get:


You can download the working implementation from this repository by running:

git clone
pip3 install -r GradCAM-Keras/requirements.txt
cd GradCAM-Keras

The actual code can be found in, let’s look at it:

First we need to load the input image file into memory, resize it to (224, 224) since that’s what the model expects as input.
We also need to preprocess it, in this case we’re using a VGG19 model pretrained on ImageNet. This means the training images were preprocessed by subtracting the mean of each colour channel. Hopefully Keras takes care of that for us, so we can just use the preprocess_input function. Before doing that, we also need to add a dimension to the image: the batch dimension. Indeed, the model expects to run on a batch of images, however we only got one image here, so we need to reshape it from (224, 224, 3) to (1, 224, 224, 3) using the np.expand_dims function.

image = np.array(load_img(args.input, target_size=(224, 224)), dtype=np.uint8)
image_processed = preprocess_input(np.expand_dims(image, axis=0))

Once our input is treated, we can instantiate the model and run it on the image:

model = VGG19(include_top=True, input_shape=(224, 224, 3))
prediction = model.predict(image_processed)
predicted_class = np.argmax(prediction)
predicted_class_name = decode_predictions(prediction, top=1)[0][0][1]

Specifying the include_top flag tells Keras we want the model to predict classes and not just extract features from our image. This is what constraints the image shape to (224, 224, 3), since the fully connected network expects a certain fixed shape. Otherwise we could just run the series of convolution upon any image shape considering the filters for the convolutions don’t need to be changed.

We then run the model onto the image and look which class is most activated (predicted_class is our c from earlier), we also get the class label to display it later (can be “llama”, “sunglasses” etc).
We then extract the tensors we’re interested in from the model:

y_c_tensor = model.output[0, predicted_class]
A_tensor = model.get_layer(args.layer).output
gradient_tensor = K.gradients(y_c_tensor, A_tensor)[0]
run_graph = K.function([model.input], [A_tensor, gradient_tensor])

In TensorFlow, models are defined by their static graph (tensors linked with operations) first and then the input values “flow” into it (hence the name “tensor flow”), this mean that all the objects we’re manipulating don’t yet¬†contain any value, but will eventually when we run the model. That’s why I called every tensor object with the _tensor suffix.

Here, we link our model’s input (the image) to the tensors we want it to output for us, namely the gradient of the output w.r.t. the filters’ output, as well as the filters’ output itself.

A, gradient = run_graph([image_processed])
A, gradient = A[0], gradient[0]  # Gets the result of the first batch dimension

We can make our input image flow through this new function run_graph and get the actual values. Notice the _tensor suffix is now gone since those are not tensors placeholders but outputted values.

Then, we can compute our \vec{\alpha^c} vector by taking the mean value of each filter’s output:

alpha_c = np.mean(gradient, axis=(0, 1))

Finally, we need to compute the dot product of vector alpha_c and the tensor A:

L_c =, alpha_c)

And there we go ! However that L_c isn’t at the same resolution as our image: since we use 'valid' padding, the width and height diminished gradually after each convolution, we therefore need to upscale it a bit by zooming it by a factor of¬† 224/L_c.shape[0] to end up with a (224, 224) grayscale image.

Using a bit of SciPy and Matplotlib magic we can output the superimposed heat map:

L_c = zoom(L_c, 224/L_c.shape[0])

plt.subplots(nrows=1, ncols=2, dpi=160, figsize=(7, 4))
plt.subplots_adjust(left=0.01, bottom=0.0, right=0.99, top=0.96, wspace=0.11, hspace=0.2)
plt.title("Original image")
plt.title("{}th dimension ({}) \nw.r.t layer {}".format(predicted_class, predicted_class_name, args.layer))
plt.imshow(L_c, alpha=0.5, cmap="jet")