YOLO is all you need in object detection

5 minute read

Published: December 29, 2018

The title is obviouly inspired by a famous paper, Attention is all you need. I’ll talk about the usage of YOLO in a real project.

The description of the task

In a medical medical research project, a real X-ray image is expected to be processed for further judgement whether hydrarthrosis of the knee occurs.

Here we only focus a part of the project – detect and truncate the knees in the images.

The mission seems quite simple cause state-of-the-art object detection methods perform well in detecting objects in vedios, not to mention detect in images. What’s more, the features of the keen are so unique that even traditional image process methods (no deep neruel network) may work. At the very beginning, we also thought so.

Talk is cheap, show me the code. The difficulties coming along were out of expetection.

Difficulties ahead

The number of the images is too small. In a Chinese hospital, images are organized in person-level, which means a person has a diretory which contains all his images such as X-ray image, MRI, and so on. Therefore, the selection of X-ray images by doctors takes long time. And you know, Chinese doctors are busy with all kinds of stuff. So only 20 or so images were available at first. As we all know, a deep learning model needs to be feeded of large quantities of images for training.
The qualities of the images are bad. The shoots of X-ray were casually done resulting in few standard images. The images below shows some real images from a hospotal.
A image may include more than one leg😤. An experienced doctor is competent to diagnose whatever the appearence of the image. Nevertheless, for a machine it matters a lot.

The first method I tried

I tried traditional methods when waiting for more images. Motivated by an article [1], a pixel-level match method was tried.

The idea is intuitive: given a templete, say, a knee from a standard image, find the most matched part in a test image. This method applies for looking for a exactly same part as the templete in an image. Nevertheless, the knees of the images differ a lot.

Anyway, in the case that no more images are available, I tried this method.

The result was fairly good. Nearly a half of the images were successfully detected. But this method failed to work in a image with multiple legs. In addition, pixel-level features highly depends on the templete image which means a image with a similar pixel distribution will get a good result while the others may lead to disappointing results. To sum up, it is not a general method.

The second method I tried

With no more images, I came up with another traditional method. I noticed that the crack of a knee is special feature which distinguish the knee out. If we can find the crack, we can approximately draw a box over the crack which is the knee’s position. Things like finding a crack are called edge detection. Edge detection is broadly used for applications like finding the barcode on a product.

My process steps:

GaussianBlur
Canny detection
Dilate
Closs operation with a cross element
Find the biggest part in the image which is supposed to be the knee

Tuning the parameters is an art. Gaussian blur is needed or not? Canny or Sobel? Dilate or erode? How many iterations when dilating or eroding? Close or open operation? What about the structure of the element? Different combination trials leads to a fine-grained configuration.

Final results were also pretty-good, but not good enough for my project.

The final try: YOLO

Thanks to god, 280 more images were provided this month. Although the number is under our expectations, 280 is better than none. Then we followed a guide [2] and tried YOLO.

YOLO(You Only Look Once) is state-of-the-art object detection model. I was not sure whether 280 images for training were enough.

The process is summaried as follows:

Label the data.
Create a virtual environment using Conda or pyenv, install packages like tensorflow-gpu using pip.
Connect to the remote machine by ssh.
Use SFTP to upload files synchronously.
Train the data using GPU.
Test.

LabelImg was used to label a new class. The training process took two hours on a 1080Ti GPU. Many bugs were faced with and fixed. The result was amazing.

Some hyper-parameters are listed below.

Hyper-parameters	Value
Batch size	5
Learning rate	0.001
Epoch	100
Momentum	0.9

Lovely bugs:

The slash in Linux system is ‘/’ while it in Window is ‘'. A local machine of mine is equipped with Windows OS so a conflict occured.
After training, the trained weight file should be put in a directory.
Out of memory error. Just reduce the batch size.
The guide is not detailed. A deep understanding of how the code works is needed.

Reference:

[1] https://club.6parker.com/enter1/index.php?app=forum&act=threadview&tid=14301264

[2] https://www.e-learn.cn/content/python/724590

Share on

Twitter Facebook LinkedIn

Zhengyu Wu

YOLO is all you need in object detection

The description of the task

Difficulties ahead

The first method I tried

The second method I tried

The final try: YOLO

Reference:

Share on

You May Also Enjoy

Leetcode 08

Leetcode 08

Leetcode 07

Leetcode 06