What is a Convolution Neural Network

A Convolutional Neural Network, or CNN is a type of neural network that applies a series of convolutions onto an input image to produce an output image. However, contrary to more classic image filtering techniques the coefficients of the filters (or kernels) applied onto the image can be tuned using gradient descent or any other optimisation algorithm.

Explanation

To perform a convolution, a CNN hovers a number of filters (below in yellow) over the entire input image (below in green) and multiply each pixel value by each value in the kernel to produce a (usually) smaller image (below in pink) as shown below:

Source https://stats.stackexchange.com/a/188216/174997

The filters are moved with a given stride in each direction, here 1×1 which is classic, but to accelerate the convolution it can move in stride of 2×2 etc. This operation is repeated with different filters for each convolution layer in the model.

The number of trainable parameter N can be computed using the following formulae:

N = K_1 \times K_2 \times F \times C + F

For a convolutional layer with F filters of kernel shape K_1 \times K_2 applied to an input of shape (h, w, C).

Hence, if the RGB image in the beginning was of shape (200, 200, 3), after a 10 filters convolution with kernel size 5×5 and stride 1, the shape will be (196, 196, 10) and that layer will have 5\times5\times10\times3 + 10 = 760 trainable parameters (the filter have shape (5, 5, 3) ) as demonstrated in Keras:

from tensorflow.python import keras
from keras.models import Sequential
from keras.layers import Conv2D

Sequential([
    Conv2D(input_shape=(200, 200, 3), filters=10, kernel_size=5, strides=1)
]).summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_2 (Conv2D)            (None, 196, 196, 10)      760
=================================================================
Total params: 760
Trainable params: 760
Non-trainable params: 0
_________________________________________________________________

Implementation

Using François Chollet’s Keras framework, a convolutional layer can be used in a model using the Conv2D class.
Its constructor with the most common arguments is as follow:

keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', activation=None)

Here:

  • filters is the number of filter to apply on with this layer (i.e. F)
  • kernel_size is either a tuple of integers or an integer specifying the size of the kernel to hover, this has to be odd numbers since the kernel’s centre will determine the position of the output value in the output image.
  • strides is either a tuple of integers or an integer specifying the pace at which the kernel should hover
  • padding is a string with either 'valid' or 'same', defaults to 'valid'.
    • 'valid' padding in Keras (and TensorFlow) means no padding, i.e. the kernel will stop hovering near the borders to avoid falling over the edge and have missing values. This means that the border pixel wont have a convolution value and hence that the output is a tad smaller in height and width (in the above example, the padding was 'valid' and so the 200×200 image became 196×196).
    • 'same' padding means padding the edge with zeros so that the output image has the same width and height as the input image (hence the name). However this is not the default in Keras as it means the border get corrupted with fake data.
  • activation is either a string like 'tanh', 'relu', 'softmax' etc (list here) or an object of type keras.activations. By default this will be the linear activation function f(x)=x. Each output value of the layer will be passed through this function before being passed to the next layer.