Download and pre-process Images for YOLO training

Download training Images

The following post walks you through collecting training image set for your image processing projects in the ML space. This is actually a part of the YOLO Object detection tutorial series. I thoughts its best to keep this part simple and separate from the actual training process itself.

The easiest way to download training images in bulk is to use some scripts to automate google search. There are many ways to do so, most of them using google APIs and Python scripts. But in the interest of keeping it simple, I have decided to go with a google chrome extension which allows us to download pictures from a google search. This kick-ass Chrome extension is such a god send to image projects as it removes one of the main bottlenecks, which is collecting sample training images. The tool is fairly self explanatory and the downloaded images are in a zip file on your computer. Unzip them into individual folders.

Try downloading images of individual species of birds, and also some with combinations so that it helps with the robustness of the trained model.

Bounding Boxes

Once the images are downloaded, the next step is to pre-process them to get them ready for training. This step includes creating “bounding boxes” for each of these images, and creating a helper file corresponding to each image file to tell the model what’s contained in those bounding boxes. Basically we are drawing rectangles around the birds and telling what bird it is. The first number in each row corresponds to the position of that class in obj.names file.
3 0.348476 0.595122 0.672561 0.809756
3 0.629878 0.342927 0.295122 0.615610
3 0.780793 0.193659 0.303049 0.377561
In the example above, the first number represents what class of birds is contained in the bounding box in the line. The four numbers following it are the positional coordinates of the bounding box, where (0,0) is the origin and (1,1) is the right top corner of the image. This is the format that YOLO model expects the training data to be in. An image file, and a corresponding helper file with the bounding box details.
There are many tools to create bounding boxes, and we use a particularly popular open source tool called labelImg. This tool is fairly easy to use, but if you need a quick intro to using it, you can watch this video here.

Unfortunately this is a manual step, requiring us to go through each of the image files. Owing to that, it takes a long time to go through all the images. The higher the number of classes, and the higher the number of images per class, the longer it takes. Try to outsource this work to your friends, colleagues etc. to get through it faster.

Leave a Reply