The implementation of AlexNet

4 minute read

Published: September 17, 2018

Until now, I’ve implemented AlexNet and visualized the graph and the loss of the model by TensorBoard.

AlexNet

AlexNet, the champion of ImageNet 2012, is one of the most famous model in computer vision field which proved the effectiveness of CNN on a complicated model. This paragraph will briefly introduce AlexNet.

The architecture of AlexNet is shown in the figure below.

The layers of each module can be seen in the table in detail.

Module	Layers
1	Conv - ReLU - Pooling - LRN
2	Conv - ReLU - Pooling - LRN
3	Conv - ReLU
4	Conv - ReLU
5	Conv - ReLU - Pooling
6	FC - ReLU
7	FC - ReLU
8	FC - Softmax

In the sixth and seventh layer, it is feasible to add dropout to eliminate overfitting.

AlexNet achieved top-1 and top-5 error rates of 37.5\% and 17.0\% in the ImageNet LSVRC-2010 contest.

AlexNet on CIFAR

When applied on CIFAR dataset, the only difference is the input size which also means the size of parameters in this model will be changed.

The original input size of AlexNet is 224x224x3 while the input size of a single image in CIFAR is 32x32x3.

The modified size of parameters are summarized in the table.

Modulel	Layer	Size
1	Conv1	Kernel size: [3, 3] Filters:24
1	ReLU1	——–
1	Pooling1	Pool size: [2, 2] Strides:2
1	LRN1	——–
2	Conv2	Kernel size: [3, 3] Filters:96
2	ReLU2	——–
2	Pooling2	Pool size: [2, 2] Strides:2
2	LRN2	——–
3	Conv3	Kernel size: [3, 3] Filters:192
4	Conv4	Kernel size: [3, 3] Filters:192
5	Conv5	Kernel size: [3, 3] Filters:96
5	Pooling5	Pool size: [2, 2] Strides:2
6	FC6	Unit:1024
7	FC7	Unit:1024
8	FC8	Unit:10

The output size is [batch_size x 10] which is the one-hot presentation of the labels.

My implementation

I got really confused with tf, keras, tf.keras at the beginning. Each of them has the similar usage. Finally I chose tf.keras API because I could implement my model using the interface tf.keras.Model. I thought it would make coding easier and make my program a good coding style.

I encapsulated AlexNet model into a class.

class AlexNet(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.conv1 = tf.keras.layers.Conv2D(
            filters=24,               # Num of the kernel
            kernel_size=[3, 3],       # Size of the receptive field
            padding="same",           # Padding strategy
            activation=tf.nn.relu,    # Activation function
            data_format="channels_last"
        )
       ...
       ...
       ...
    def call(self,input):
        input = tf.reshape(input, [-1, 32, 32, 3]) 
        x = self.conv1(input)   #[batch_size, 32, 32, 24]
        x = self.lrn1(x)        #[batch_size, 32, 32, 24]
        x = self.pool1(x)       #[batch_size, 16, 16, 24]      
        ...
        ...
        ...  

Then define the variables in the model as well as the loss.

model = AlexNet()
data_loader = DataLoader()
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
X_placeholder = tf.placeholder(...)
y_placeholder = tf.placeholder(...)
y_pred = model(X_placeholder)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_placeholder, logits=y_pred)
train_op = optimizer.minimize(loss)

Finally, train the model for as many times as you can within a session.

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)  
    for batch_index in range(300):
        X, y = data_loader.get_batch(batch_size)
        sess.run(train_op, feed_dict={X_placeholder: X, y_placeholder: y})

The code above shows exactly the logic of my implementation. More details can be seen at https://github.com/GEORGE5961/AI-mini-project/tree/master/AlexNet.

Results I got

I run the program by my Mac which doesn’t have an independent VGA card. My laptop roared when running more than 100 iterations. The maximum iteration I trained the model is just 300 times. The best loss I got was more than 1.4 while I noticed the ideal loss was much more less than this number. Now the best prediction rate is 46.47\% and I think it will be remarkably better if it is trained for more time.

I followed Prof.Hua’s guide and used TensorBoard to visualized my training process.

The figure below shows the loss curve when I trained for 300 times for more than an hour.

And I also got the graph of my model which is shown in the figure. This graph does not demonstrate each layer of the model since I didn’t name each layer.

Bugs I met and fixed

Distinguish one-hot label and the normal label. The size of one-hot labels is [batch_size x 10] while the size of normal labels is [batch_size, ]. For example, tf.losses.sparse_softmax_cross_entropy needs two parameters, one for one-hot labels and another for normal labels.
If the last layer is a convolutional layer and the format of the convolutional layer is “channels_last”, then the axis of batch normalization should be -1. If the format of the convolutional layer is “channels_first”, then the axis of batch normalization should be 1.
While using TensorBoard, I only got the graph of the model but no scalers. I examined my code and read several blogs on the Internet but still could make it. At last, I found it is because the iteration times is too few to display the scalers.

Share on

Twitter Facebook LinkedIn

Zhengyu Wu

The implementation of AlexNet

AlexNet

AlexNet on CIFAR

My implementation

Results I got

Bugs I met and fixed

Share on

You May Also Enjoy

Leetcode 08

Leetcode 08

Leetcode 07

Leetcode 06