A Convolutional Neural Network, or CNN is a type of neural network that applies a series of convolutions onto an input image to produce an output image. However, contrary to more classic image filtering techniques the coefficients of the filters (or kernels) applied onto the image can be tuned using gradient descent or any other optimisation algorithm.
To perform a convolution, a CNN hovers a number of filters (below in yellow) over the entire input image (below in green) and multiply each pixel value by each value in the kernel to produce a (usually) smaller image (below in pink) as shown below:
The filters are moved with a given stride in each direction, here 1×1 which is classic, but to accelerate the convolution it can move in stride of 2×2 etc. This operation is repeated with different filters for each convolution layer in the model.
The number of trainable parameter N can be computed using the following formulae:N = K_1 \times K_2 \times F \times C + F
For a convolutional layer with F filters of kernel shape K_1 \times K_2 applied to an input of shape (h, w, C).
Hence, if the RGB image in the beginning was of shape
(200, 200, 3), after a 10 filters convolution with kernel size 5×5 and stride 1, the shape will be
(196, 196, 10) and that layer will have 5\times5\times10\times3 + 10 = 760 trainable parameters (the filter have shape
(5, 5, 3) ) as demonstrated in Keras:
from tensorflow.python import keras from keras.models import Sequential from keras.layers import Conv2D Sequential([ Conv2D(input_shape=(200, 200, 3), filters=10, kernel_size=5, strides=1) ]).summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_2 (Conv2D) (None, 196, 196, 10) 760 ================================================================= Total params: 760 Trainable params: 760 Non-trainable params: 0 _________________________________________________________________
Using François Chollet’s Keras framework, a convolutional layer can be used in a model using the
Its constructor with the most common arguments is as follow:
keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', activation=None)
filtersis the number of filter to apply on with this layer (i.e. F)
kernel_sizeis either a tuple of integers or an integer specifying the size of the kernel to hover, this has to be odd numbers since the kernel’s centre will determine the position of the output value in the output image.
stridesis either a tuple of integers or an integer specifying the pace at which the kernel should hover
paddingis a string with either
'same', defaults to
'valid'padding in Keras (and TensorFlow) means no padding, i.e. the kernel will stop hovering near the borders to avoid falling over the edge and have missing values. This means that the border pixel wont have a convolution value and hence that the output is a tad smaller in height and width (in the above example, the padding was
'valid'and so the 200×200 image became 196×196).
'same'padding means padding the edge with zeros so that the output image has the same width and height as the input image (hence the name). However this is not the default in Keras as it means the border get corrupted with fake data.
activationis either a string like
'softmax'etc (list here) or an object of type
keras.activations. By default this will be the linear activation function f(x)=x. Each output value of the layer will be passed through this function before being passed to the next layer.