How to Implement a Deep Neural Network for the CIFAR-10 dataset

5 min readJun 12, 2020

What is CIFAR-10?

CIFAR-10 means Canadian Institute For Advanced Research.it is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Computer algorithms are used for recognizing objects in photos. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects. Since the images in CIFAR-10 are low-resolution (32x32), this dataset can allow researchers to quickly try different algorithms. Various kinds of convolutional neural networks tend to be the best at recognizing the images in CIFAR-10.

What is Neural Networks?

Neural networks is one of the powerful models that are core of deep learning consisting of multi-layer nodes also called as networks, sequence models and many more. In this blog, I will explore the CIFAR-10 dataset and implement a simple neural network (multi-layer perception).

The concept of a neural network is actually quite simple. it activates like the functional human brain, the input given to a network can be a picture or variables.where several layers are formed in between the input and output which are called hidden layers. The number of hidden layers are formed between the input and output depends on the accuracy of algorithm .The hidden layers increases or decreases depends on the accuracy of algorithm.

Most of today’s neural nets are organized into layers of nodes, and they’re “Forward” meaning that data moves through them in only one direction. To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item,a different number over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number.

If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node which sends the number (or) the sum of the weighted inputs along all its outgoing connections.

When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer and the input layer and it passes through the succeeding layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

Exploring the Dataset

Training of model

As stated from the CIFAR-10 information page, this dataset consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. Since we are working with colored images, our data will consist of numeric values that will be split based on the RGB scale.

For this project,We use Neural Networks as it is a classification problem. we will be taking 10% of the data as the validation set and 90% as the training set. The optimizer will be stochastic gradient descent and the batch size for gradient descent will be 128. Stochastic gradient descent is an approximation of gradient descent. The gradient of the loss function is applied to a batch of all the training points instead of the whole set, which is much faster to compute. This stochastic batch sampling of training samples introduces a lot of noise, which is actually helpful in preventing the algorithm from getting stuck in narrow local minima.

Base Model class & Training on GPU

Let’s create a base model class, which contains everything except the model architecture.We will later extend this class to try out different architectures to solve any image classification problem

In this code we define a function for predictions of the losses and accuracy of both training data and validation data and the epoch to know the loss of the data at different levels.

In this code we create a Evaluate function for epoch training and validation data and return the progress of our model after each epoch and the fit function which will be used to update the weights for each epoch.

Define some utilities for moving out data & labels to the GPU, if one is available

Since we are using PyTorch, we have the option to use the GPU or CPU for training and evaluating our model. GPU’s are much more efficient for updating and calculating weights because they are optimized for matrix calculations as opposed to the CPU. Therefore, we will move our data to the CPU if GPU is not available.

In this model,I’m using 4 different hidden layers that are 1810,640,320 and 60.

Defining a function for Losses

Defining a function for Accuracy

Training the model

We will make several attempts at training the model. Each time, try a different architecture and a different set of learning rates. i.e

Increase or decrease the number of hidden layers
Increase of decrease the size of each hidden layer
Try different activation functions
Try training for different number of epochs
Try different learning rates in every epoch

Creating a image classification base of different image sizes as input and and outputs.Now we are ready to implement the model at different levels and check the losses accordingly.Once we converge the epoch closer to the minimum, we will take smaller step sizes like 0.01,0.001 etc until we reach the lowest possible minimum.

Let’s look at the epochs at different learning rates 0.1,0.01,0.001,0.001

From this we can observe that we reached to the minimum losses and their accuracy at different epochs. the minimum loss ended with 1.3272 and accuracy is 53.4%.

Distribution of Losses Vs Accuracy

Now that we know we’ve gotten the best model possible, let us test it against the testing set

An accuracy of 54%! There goes a constant loss and accuracy at certain number of epochs and learning rates. So changing the Epochs and learning rates in order to find the best model for your dataset but in this case, it would not be practical to use this model to classify anything since the accuracy is quite low. These networks are deep neural networks that make use of convolution layers which use convolutional filters to process and produce images. Sort of how humans can get a better idea of what an image is when we look at the full picture or parts of a picture, convolution networks are able to look at a portion of a picture, allowing it to retain more information about an image rather than looking at a pixel.