Use the pre-trained VGG model to fine-tune with CIFAR-10 dataset

5 minute read

Published:

This week I learned how to use a pre-trained model to train a dataset. I implemented a pre-trained VGG model to fine-tune with CIFAR-10.

What is a ‘deep’ learning model?

First, I’d love to review the concept of deep learning. In the last weekly report to Prof.Hua, I reckoned that AlexNet is only 8 layers so it cannot be seen as a deep learning model. Prof.Hua pointed out that my opinion was not correct. Then I read a related blog and got a clear idea about ‘deep’ learning.

The main idea is that, if the model uses hierarchical feature learning — identifying lower level features first, and then build upon them to identify higher level features (e.g. by using convolution filters) — then it is a Deep Learning model.

As an extreme example, if a neural network only has a 100 fully-connected layers, it wouldn’t be a Deep Learning model.

So AlexNet is absolutely a deep learning model.

Transfer learning

Transfer learning is quite useful for AI programmers. It common to spend couple of weeks training a model against a database even with GPUs. Why not we use the pre-trained parameters by others? We just need to fine-tune the pre-traied model and sometimes change the final layer then we save a large amount of time!

The next thing is to select a model which is suitable for your database. We know there are several models since AlexNet merged in 2012. These models differ in size and accuracy. The image below is an analysis of deep neural network models for pratical applications from a paper.

If your task is to classify 10 or 100 classes, I think most models in the image will work. But if you want to classify 1000 categories of images, some models may show a bad performance.

By the way, transfer learning can be appied to extract features of an image. For example, leverage part of a pre-trained AlexNet model to extract the lower-layer features of an image and then redesign the rest part of the model.

VGG19

Between VGG and ResNet-18, I chose VGG. Reasons are quite simple. First, this is my first time to use a pre-trained model so I chose a relatively simpler model. Second, VGG is good enough to classify a dataset like CIFAR-10. Finally, VGG sounds like a famous brand, UGG, which is well-known for its boots.

VGG has two versions, VGG16 and VGG19, which differs in the number of layers, one for 16 and another for 19. This time I chose VGG19 because most people used VGG16.

VGG is really simple if you are familiar with CNN architecture.

Model implementation

Tensornets is a library including many popular models. You can readily use pip to install Tensornets.

There are 3 key issues in the implementation. The rest is almost the same as when you use tensorflow. The raw input shape of VGG is (None,224,224,3) but the input shape of CIFAR-10 is (None,32,32,3) which means the image CIFAR-10 has less pixels. The raw output size of VGG is (None,1000) but the output size of CIFAR-10 is (None,10).

  1. What about the inconsistency of the input shape of your dataset and original input shape? Enlarge your input shape by filling zeros? Or change the size of the model?

  2. What about the inconsistency of the output shape of your dataset and original output shape? Add a layer? Or change the final layer?

  3. How to load pre-train weights?

Now we begin to resolve them one by one.

As to the first issue, I read many blogs at the very fisrt but no one mentioned this issue. Finally a blog told me to use skimage.transform package.

The package re-scales the image with certain algorithm instead of interpolating zeros.

As to the second issue, just pass the parameter ‘classes=10’ to function nets.VGG19().

As to the third issue, one sentence is enough to make it.

To be frankly, the implementation is not hard if you are skillful in TensorFlow, and if you are lucky enough to find a tutorial like a blog (so many blogs on the Internet are nothing but plagiarized and time-consuming!).

The result

I think it is the time to leverage GPUs to train. For such a deep model, my poor computer is not competent to run such a task. It run painfully for more than 2 hours but still didn’t finish. I quit with pity.

I visualized the graph and printed the inner architecture of the model. It shows the exact architecture of VGG19.

I also tried to test the validation data without any additional training but it was also grind because the validation data also need go through the deep network, which means a huge amount of calculation.

To summarize, the model is finished but it cannot run on my computer.

Questions

I still don’t know whether I should enlarge the input shape to fit the model, or vice versa modify the size of the model.

Reference

  1. A. Canziani, A. Paszke, and E. Culurciello. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678, 2016.
  2. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Computer Vision and Pattern Recognition. 2014 arXiv:1409.1556v6.
  3. https://towardsdatascience.com/transfer-learning-in-tensorflow-9e4f7eae3bb4
  4. https://towardsdatascience.com/how-deep-should-it-be-to-be-called-deep-learning-a7b1a6ab5610