Deep Learning:Convolution Neural Network


Deep Learning: Convolution Neural Network: A convolution neural network is a type of supervised learning.

Convolutional Neural Network strength:

       Feature extraction and classification are integrated into one structure

        Fully adaptive.

        Network extracts 2-D image features at increasing dyadic scales.

       Relatively invariant to geometric, local distortions in the image.

       Applications: Hand-written digit recognition, face detection, and face recognition

 

A Convolutional neural networks are designed to process two-dimensional (2-D) image

A CNN consists of three layers

 (i) Convolution layers

(ii) Sub-sampling layers

(III) Outputlayer or fully-connected layers(fc).

Network architecture is shown in fig1.

                                                Figure1:Network Architecture

Network layers are arranged in a feed-forward structure: Each convolution layer is followed by a sub-sampling layer and the last convolution layer is followed by the output layer

       The convolution and sub-sampling layers are considered as 2-D layers,

       The output layer is considered as a 1-D layer.

        In CNN, each 2-D layer has several planes.

        A plane consists of neurons that are arranged in a 2-D array.

      Output of a plane is called a feature map.

Following are important steps to design CNN model

1)      Each input image will go directly to  a the sequence of filter(Kernals): convolution layers

2)      Pooling

3)      Fully connected layers (FC)

4)      Applying Softmax to categories  an object with possible values between 0 and 1.

5)      Apply deep learning to train and test CNN models.

Step1.Each input image will go directly to  a sequence of filter(Kernals) convolution layers.

1)      Convolutional stage:

The first layer, for extracting features from an input image is the Convolution layer. Convolution maintains the connection between pixels. It is a statistical process that involves two inputs, such as an image matrix and a filter or kernel.

It consists of 2D kernels that are convolved with the input signal resulting in the output feature-maps. Kernels mathematically determine the locally weighted total of inputs, or perform a separate(discrete) convolution, as the name suggested, each kernel is transformed with all the preceding layer feature maps and generates a 2D output.

To construct the 3D output feature-map of the convolution layer, the outputs of all kernels of a given layer are stacked together.In the convolution layer, the total number of parameters is identical to the number of kernels multiplied by each kernel size.Figure 2 shows 



Figure2:Multiplication of Image matrix and  kernel or filter matrix

Consider a 5 x 5 whose pixel image characters are 0, 1 and a 3 x 3 filter matrix, as shown figure3.


   

                            Figure3: Image matrix multiplies kernel or filter matrix

Then the 5 x 5 image matrix convolution multiplies with a 3 x 3 filter matrix called "Feature Map" as seen in figure 4.


                                            Figure 4: 3 x 3 Output matrix

Convolution of an image with different filtres will execute functions such as edge detection, blur and sharpening. Figure 5 below shows various convolution filters.


                        Figure 5:Types of kernel

Strides: 

               Stride is the integer of changes over the input matrix in pixels. If the phase is 1, we change the filtres to 1 pixel at the same time. The filters are moved to 2 pixels at a time and so on if the process is 2.and so on. The figure below reveals that convolution will function with a progression of 2.Figure 6.shows convolution with stride 1 and Figure 76shows convolution with stride 2.Columns get shifted by 2.First take 3*3matrix then shift 2 columns and take next 3*3 again shift 2 columns and take 3*3 matrix.



Figure:6 Stride of 2 pixels


Padding:                           

The filter also doesn't match the input image perfectly. We have got two possibilities:

·         Pad the image with zeros (zero-padding) so that it suits.

·         Drop the picture section where the filter did not suit. This is called true padding, which only holds part of the picture valid.

 

           Non Linearity (ReLU):

For a random (non-linear) function, ReLU stands for Rectified Linear Unit. (x) = max(0,x) is the output as shown in fig7.

Why ReLU is important: ReLU’s motive is to establish non-linearity in our ConvNet. Since positive linear values would be the actual world data that our ConvNet would like to understand.

 



                                 Figure:7 ReLU operation

There are other nonlinear purposes which can also be related alternatives of ReLU, such as tanh or sigmoid. Most data analyst utilizes ReLU because ReLU is superior to the other two in terms of efficiency.

 

         Step2  Pooling Layer:

                  The segment for pooling layers will decrease the character of specification if the images are excessively sizable. Spatial pooling is often referred to as under-sampling or reduced sample, which decreases each map's dimensionality but preserves the essential data. There can be distinct forms of spatial pooling:

·             Max Pooling

·             Average Pooling

·             Sum Pooling

 

The largest element is taken from the rectified function map by Max pooling. It could also take the average pooling to take the main part. Sum of all components as sum pooling in the function map call. In figure 8 out of first 2*2 matrix 6(max number) is taken.


                                                                            Figure 8 Max Polling


There are other nonlinear purposes that can also be related to alternatives of ReLU, such as tanh or sigmoid. Most data analysts utilize ReLU because ReLU is superior to the other two in terms of efficiency.

Pooling Layer:

                  The segment for pooling layers will decrease the character of specification if the images are excessively sizable. Spatial pooling is often referred to as undersampling or reduced sample, which decreases each map's dimensionality but preserves the essential data. There can be distinct forms of spatial pooling:

·             Max Pooling

·             Average Pooling

·             Sum Pooling

 

The largest element is taken from the rectified function map by Max pooling. It could also take the average pooling to take the main part. Sum of all components as sum pooling in the function map call.


Step 3 Fully Connected Layer: We flattened our matrix into a magnitude and direction and supply it into a completely connected layer such as a neural network, the layer we call the FC layer. 


Figure 9: After pooling layer, flattened as FC layer


The function map matrix will be transformed to a vector as shown in fig 9. above (x1, x2, x3, ...). To build a model, we combined these features together with the completely linked layers as shown in fig 10. Finally, to define the outputs as cat , dog, car , truck, etc as per the application.

Step 4 Use activation mechanism such as softmax at the output


         Figure 10: Complete CNN architecture


Step5   Apply deep learning to train and test CNN models.

Summary:

·         Provide image input into the convolution layer.

·         Choose parameters, apply stride filtres and padding if necessary.

·         Do  convolution on the image and add the matrix to ReLU activation.

·         Perform pooling to decrease dimension size.

·         Attach as many layers of convolution until fulfilled.

·         Flatten and feed the output into a fully linked layer (FC Layer)

·         Use an activation function to output the class and identify images.

 





   


 


 

Comments

Popular posts from this blog

WHY DEEP NEURAL NETWORK: Difference between neural network and Deep neural network

Training and Testing Phase of Neural Network

SVM Algorithm for Python Code