In my experiment, I train a multilayer CNNfor street view house numbers recognition and check the accuracy of test data.The coding is done in python using Tensorflow, a powerful library forimplementation and training deep neural networks. The central unit of data inTensorFlow is the tensor. A tensor consists of a set of primitive values shapedinto an array of any number of dimensions.
A tensor’s rank is its number ofdimensions. 20 Along with TensorFlow used some other library function such asNumpy, Mathplotlib, SciPy etc.Firstly, as I have technical resourcelimitation I perform my analysis only using the train and test dataset. Andomit extra dataset which is 2.
7GB. Secondly, to make the analysis simpler Ifind and delete all those data points which have more than 5 digits in theimage. For the implementation, I randomly shuffle valid dataset I have used thepickle file svhn_multi which I created by preprocessing the data from theoriginal SVHN dataset. Then used the pickle file and train a 7-layer ConvolutedNeural Network.
Finally, I cast off the test data to check for accuracy of thetrained model to detect number from street house number image. Atthe very beginning of my experiment, first convolution layer I used 16 featuremaps with 5×5 filters, and originate 28x28x16 output. A few ReLU layers arealso added after each layer to add more non-linearity to the decision-makingprocess. After first sub-sampling the output size decrease in 14x14x10. Thesecond convolution has 512 feature maps with 5×5 filters and produces 10x10x32output.
In this moment applied sub-sampling second time and shrink the outputsize to 5x5x32. Finally, the third convolution has 2048 feature maps with samefilter size. It is mentionable that the stride size =1 in my experiment alongwith this zero padding also used here. During my experiment, I used dropouttechnique to reduce the overfitting.
Finally, the last layer is SoftMaxregression layer. Weights are initialized randomly using Xavier initializationwhich keeps the weights in the right range. It automatically scales theinitialization based on the number of output and input neurons. Now I train thenetwork and log the accuracy, loss and validation accuracy in steps of 500.Initially, I used a static learning rate of0.01 but later switched to exponential decay learning rate with an initiallearning rate of 0.05 which decays every 10000 steps with a base of 0.
95. Tominimize the loss, I used Adagrad Optimizer. When I reached a satisfactoryaccuracy level for the test dataset then stop the learning and save thehyperparameters in the cnn_multi checkpoint file. When I need to perform the detection,it will load that time without train the model again. Initially, the modelproduced an accuracy of 89% with just 15000 steps. It’s a great starting pointand certainly, after a few hours of training the accuracy will reach mybenchmark of 90%. However, I added some simple improvements to further increasethe accuracy of few number of learning steps. First, added a dropout layer afterthe third convolution layer just before fully connected layer.
This allows thenetwork to become more robust and prevents overfitting. Secondly, introducedexponential decay to learning rate instead of keeping it constant. This helpsthe network to take bigger steps at first so that it learns fast but over timeas we move closer to the global minimum, take smaller noisier steps. With thesechanges, the model is now able to produce an accuracy of 92.9% on the test setwith 15000 steps