The implementation of AlexNet
Published:
Until now, I’ve implemented AlexNet and visualized the graph and the loss of the model by TensorBoard.
AlexNet
AlexNet, the champion of ImageNet 2012, is one of the most famous model in computer vision field which proved the effectiveness of CNN on a complicated model. This paragraph will briefly introduce AlexNet.
The architecture of AlexNet is shown in the figure below.
The layers of each module can be seen in the table in detail.
Module | Layers |
---|---|
1 | Conv - ReLU - Pooling - LRN |
2 | Conv - ReLU - Pooling - LRN |
3 | Conv - ReLU |
4 | Conv - ReLU |
5 | Conv - ReLU - Pooling |
6 | FC - ReLU |
7 | FC - ReLU |
8 | FC - Softmax |
In the sixth and seventh layer, it is feasible to add dropout to eliminate overfitting.
AlexNet achieved top-1 and top-5 error rates of 37.5\% and 17.0\% in the ImageNet LSVRC-2010 contest.
AlexNet on CIFAR
When applied on CIFAR dataset, the only difference is the input size which also means the size of parameters in this model will be changed.
The original input size of AlexNet is 224x224x3 while the input size of a single image in CIFAR is 32x32x3.
The modified size of parameters are summarized in the table.
Modulel | Layer | Size |
---|---|---|
1 | Conv1 | Kernel size: [3, 3] Filters:24 |
1 | ReLU1 | ——– |
1 | Pooling1 | Pool size: [2, 2] Strides:2 |
1 | LRN1 | ——– |
2 | Conv2 | Kernel size: [3, 3] Filters:96 |
2 | ReLU2 | ——– |
2 | Pooling2 | Pool size: [2, 2] Strides:2 |
2 | LRN2 | ——– |
3 | Conv3 | Kernel size: [3, 3] Filters:192 |
4 | Conv4 | Kernel size: [3, 3] Filters:192 |
5 | Conv5 | Kernel size: [3, 3] Filters:96 |
5 | Pooling5 | Pool size: [2, 2] Strides:2 |
6 | FC6 | Unit:1024 |
7 | FC7 | Unit:1024 |
8 | FC8 | Unit:10 |
The output size is [batch_size x 10] which is the one-hot presentation of the labels.
My implementation
I got really confused with tf, keras, tf.keras at the beginning. Each of them has the similar usage. Finally I chose tf.keras API because I could implement my model using the interface tf.keras.Model. I thought it would make coding easier and make my program a good coding style.
I encapsulated AlexNet model into a class.
class AlexNet(tf.keras.Model):
def __init__(self):
super().__init__()
self.conv1 = tf.keras.layers.Conv2D(
filters=24, # Num of the kernel
kernel_size=[3, 3], # Size of the receptive field
padding="same", # Padding strategy
activation=tf.nn.relu, # Activation function
data_format="channels_last"
)
...
...
...
def call(self,input):
input = tf.reshape(input, [-1, 32, 32, 3])
x = self.conv1(input) #[batch_size, 32, 32, 24]
x = self.lrn1(x) #[batch_size, 32, 32, 24]
x = self.pool1(x) #[batch_size, 16, 16, 24]
...
...
...
Then define the variables in the model as well as the loss.
model = AlexNet()
data_loader = DataLoader()
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
X_placeholder = tf.placeholder(...)
y_placeholder = tf.placeholder(...)
y_pred = model(X_placeholder)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y_placeholder, logits=y_pred)
train_op = optimizer.minimize(loss)
Finally, train the model for as many times as you can within a session.
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
for batch_index in range(300):
X, y = data_loader.get_batch(batch_size)
sess.run(train_op, feed_dict={X_placeholder: X, y_placeholder: y})
The code above shows exactly the logic of my implementation. More details can be seen at https://github.com/GEORGE5961/AI-mini-project/tree/master/AlexNet.
Results I got
I run the program by my Mac which doesn’t have an independent VGA card. My laptop roared when running more than 100 iterations. The maximum iteration I trained the model is just 300 times. The best loss I got was more than 1.4 while I noticed the ideal loss was much more less than this number. Now the best prediction rate is 46.47\% and I think it will be remarkably better if it is trained for more time.
I followed Prof.Hua’s guide and used TensorBoard to visualized my training process.
The figure below shows the loss curve when I trained for 300 times for more than an hour.
And I also got the graph of my model which is shown in the figure. This graph does not demonstrate each layer of the model since I didn’t name each layer.
Bugs I met and fixed
Distinguish one-hot label and the normal label. The size of one-hot labels is [batch_size x 10] while the size of normal labels is [batch_size, ]. For example, tf.losses.sparse_softmax_cross_entropy needs two parameters, one for one-hot labels and another for normal labels.
If the last layer is a convolutional layer and the format of the convolutional layer is “channels_last”, then the axis of batch normalization should be -1. If the format of the convolutional layer is “channels_first”, then the axis of batch normalization should be 1.
While using TensorBoard, I only got the graph of the model but no scalers. I examined my code and read several blogs on the Internet but still could make it. At last, I found it is because the iteration times is too few to display the scalers.