By now, anyone within the ML space in general and computer vision in particular must have heard at least in passing about the great YOLO object detection framework. There are countless blogs on this subject, which go into depth about the architecture itself and how it achieves the high percentage of detection on Microsoft’s COCO dataset.
This post however, takes a slightly different route. I am going to walk you through a practical use case scenario for this YOLO algorithm, which happens to be a bird species detector for birds local to Sydney, Australian.
The problem statement — Identify the various species of birds visiting my backyard, so that when a particularly rare species of birds visits my home, I get notified via an SMS or email. I am only going to explain the bird detection part, and not go into how the second part ( SMS and Email ) is done. That will be for another blog post.
Assumptions: Given this is only a proof of concept application, we are only going to identify a select number of species. A full array of native birds would require a team of people to collect images, label them and train the model. It will demand quite a bit on my time and compute so I am keeping the number of species to a five.
Credits: Most of the application side of this post uses Alexy’s darknet detector application. You can learn more about this algorithm and the technical details at Alexy’s github repository.
As with any ML project, the majority of time and effort required for this project is done in the data collection stage. After identifying the species of birds I wanted to use, I had to download about fifty to sixty pictures each of these birds. I have used a Chrome extension to download those pictures.
Once the images are downloaded, the next step is to pre-process them to get them ready for training. This step includes creating “bounding boxes” for each of these images, and creating a helper file corresponding to each image file to tell the model what’s contained in those bounding boxes. Basically we are drawing rectangles around the birds and telling what bird it is. The first number in each row corresponds to the position of that class in obj.names file. 3 0.348476 0.595122 0.672561 0.809756 3 0.629878 0.342927 0.295122 0.615610 3 0.780793 0.193659 0.303049 0.377561 In the example above, the first number represents what class of birds is contained in the bounding box in the line. The four numbers following it are the positional coordinates of the bounding box, where (0,0) is the origin and (1,1) is the right top corner of the image. This is the format that YOLO model expects the training data to be in. An image file, and a corresponding helper file with the bounding box details.
I then used this open source tool to annotate them. These two tasks are the most tedious of the tasks and require an abundance of patience. The following YouTube video will walk you through the exact steps involved in downloading and labeling images.
Once the images are downloaded and labelled, we then proceed to the second part of the exercise, training and inference. I have used google’s Colab for this step. Go to my Github repository to download the Jupyter notebook which walks you through each step. Essentially, we clone the darknet repository from Alexy’s github, we then download the “Pre-trained” weights to get us started. Usage of these pre-trained weights gives us a lot of leg up in terms of computing time. With the “seed” weights ready, we upload our training images, and a few configuration files on to Colab. We then start the training process which could last between 24 to 48 hours to reach our desired accuracy levels. The following video made my be goes into detailed step for step instruction on training and inference using the trained model.
I have deliberately excluded the detailed steps here, as they are already covered in the Jupyter notebook. I just wanted this post to be tidy and to the point. Please give it a go and may be use your interesting datasets to use OpenCV and YOLO V4 combination to build some cool detection algorithms.
Although this project seems like just a nice parlour trick, there are many serious use cases for it. Scientists could use them to detect and track endangered species, livestock farmers can get alerts from potential predators and military scouts can detect enemy movements without the need for high end RADARS etc.
The author is a machine learning and computer vision enthusiast, can be reached at his LinkedIn profile.