Deep Learning:Convolution Neural Network
Deep
Learning: Convolution Neural Network: A convolution neural network is a type of supervised learning.
Convolutional
Neural Network strength:
• Feature extraction and classification are integrated into one structure
• Fully adaptive.
• Network extracts 2-D image features at increasing dyadic scales.
• Relatively invariant to geometric, local distortions in the image.
• Applications: Hand-written digit recognition, face detection, and face recognition
A Convolutional
neural networks are designed to process two-dimensional (2-D) image
A CNN consists of
three layers
(i) Convolution layers
(ii) Sub-sampling
layers
(III) Outputlayer or fully-connected
layers(fc).
Network architecture is shown in fig1.
Figure1:Network
Architecture
Network layers are arranged in a feed-forward structure: Each convolution layer is followed by a sub-sampling layer and the last convolution layer is followed by the output layer
• The convolution and sub-sampling layers are considered as 2-D layers,
• The output layer is considered as a 1-D layer.
• In CNN, each 2-D layer has several planes.
• A plane consists of neurons that are arranged in a 2-D array.
• Output of a plane is called a feature map.
Following
are important steps to design CNN model
1) Each input image will go directly to a the sequence of filter(Kernals): convolution layers
2) Pooling
3) Fully connected layers (FC)
4) Applying Softmax to categories an object with possible values between 0 and 1.
5) Apply
deep learning to train and test CNN models.
Step1.Each input image will go directly to
a sequence of filter(Kernals) convolution layers.
1)
Convolutional
stage:
The first layer, for extracting features from an input image is the Convolution layer. Convolution maintains the connection between pixels. It is a statistical process that involves two inputs, such as an image matrix and a filter or kernel.
It consists of 2D kernels
that are convolved with the input signal resulting in the output feature-maps.
Kernels mathematically determine the locally weighted total of inputs, or
perform a separate(discrete) convolution, as the name suggested, each kernel is
transformed with all the preceding layer feature maps and generates a 2D
output.
To construct the 3D output feature-map of the convolution layer, the outputs of all kernels of a given layer are stacked together.In the convolution layer, the total number of parameters is identical to the number of kernels multiplied by each kernel size.Figure 2 shows
Figure2:Multiplication of Image matrix and kernel or filter matrix
Consider a 5 x 5 whose pixel image characters are 0, 1 and a 3 x 3 filter matrix, as shown figure3.
Figure3: Image matrix
multiplies kernel or filter matrix
Then the 5 x 5 image matrix convolution multiplies with a 3 x 3 filter matrix called "Feature Map" as seen in figure 4.
Figure 4: 3 x 3 Output
matrix
Convolution of an image with different filtres will execute functions such as edge detection, blur and sharpening. Figure 5 below shows various convolution filters.
Figure 5:Types of kernel
Strides:
Stride is the integer of changes over the input matrix in pixels. If the phase is 1, we change the filtres to 1 pixel at the same time. The filters are moved to 2 pixels at a time and so on if the process is 2.and so on. The figure below reveals that convolution will function with a progression of 2.Figure 6.shows convolution with stride 1 and Figure 76shows convolution with stride 2.Columns get shifted by 2.First take 3*3matrix then shift 2 columns and take next 3*3 again shift 2 columns and take 3*3 matrix.
Figure:6 Stride of 2 pixels
Padding:
The filter also doesn't match the input image perfectly. We have got two possibilities:
· Pad the image with zeros (zero-padding) so that it suits.
· Drop the picture section where the filter did not suit. This is called true padding, which only holds part of the picture valid.
Non Linearity (ReLU):
For a random (non-linear) function, ReLU
stands for Rectified Linear Unit. (x) = max(0,x) is the output as shown in fig7.
Why ReLU is important: ReLU’s motive is to establish
non-linearity in our ConvNet. Since positive linear values would
be the actual world data that our ConvNet would like to understand.
Figure:7 ReLU
operation
There are other nonlinear purposes which can also be related alternatives of ReLU, such as tanh or sigmoid. Most data analyst utilizes ReLU because ReLU is superior to the other two in terms of efficiency.
Step2 Pooling Layer:
The segment for pooling
layers will decrease the character of specification if the images are
excessively sizable. Spatial pooling is often referred to as under-sampling or
reduced sample, which decreases each map's dimensionality but preserves the
essential data. There can be distinct forms of spatial pooling:
·
Max Pooling
·
Average Pooling
·
Sum Pooling
The largest element is taken from the rectified function map by Max pooling. It could also take the average pooling to take the main part. Sum of all components as sum pooling in the function map call. In figure 8 out of first 2*2 matrix 6(max number) is taken.
Figure 8 Max Polling
There are other nonlinear purposes that
can also be related to alternatives of ReLU, such as tanh or sigmoid. Most data
analysts utilize ReLU because ReLU is superior to the other two in terms of
efficiency.
Pooling Layer:
The segment for pooling
layers will decrease the character of specification if the images are
excessively sizable. Spatial pooling is often referred to as undersampling or
reduced sample, which decreases each map's dimensionality but preserves the
essential data. There can be distinct forms of spatial pooling:
·
Max Pooling
·
Average Pooling
·
Sum Pooling
The largest element is taken from the rectified function map by Max pooling. It could also take the average pooling to take the main part. Sum of all components as sum pooling in the function map call.
Step 3 Fully Connected Layer: We flattened our matrix into a magnitude and direction and supply it into a completely connected layer such as a neural network, the layer we call the FC layer.
Figure 9: After pooling layer,
flattened as FC layer
The function map matrix will be transformed to a vector as shown in fig 9. above (x1, x2, x3, ...). To build a model, we combined these features together with the completely linked layers as shown in fig 10. Finally, to define the outputs as cat , dog, car , truck, etc as per the application.
Step 4 Use activation mechanism such as softmax at the output
Figure 10: Complete CNN architecture
Step5 Apply deep learning to train and test CNN models.
Summary:
·
Provide image
input into the convolution layer.
·
Choose
parameters, apply stride filtres and padding if necessary.
·
Do convolution on the image and add the matrix to
ReLU activation.
·
Perform pooling
to decrease dimension size.
·
Attach as many
layers of convolution until fulfilled.
·
Flatten and feed the output into a fully
linked layer (FC Layer)
·
Use an activation function to output the
class and identify images.
Comments
Post a Comment