There’s no shortage of interesting problems in computer vision, from simple image classification to 3D-pose estimation. One of the problems we’re most interested in and have worked on a bunch is object detection. Like many other computer vision problems, there still isn’t an obvious or even “best” way to approach the problem, meaning there’s still much room for improvement. Before getting into object detection, let’s do a quick rundown of the most common problems in the field.
Object detection vs. other computer vision problems
Probably the most well-known problem in computer vision. It consists of classifying an image into one of many different categories. One of the most popular datasets used in academia is ImageNet, composed of millions of classified images, (partially) utilized in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)annual competition. In recent years classification models have surpassed human performance and it has been considered practically solved. While there are plenty of challenges to image classification, there are also plenty of write-ups on how it’s usually solved and which are the remaining challenges.
Similar to classification, localization finds the location of a single object inside the image.
Localization can be used for lots of useful real-life problems. For example, smart cropping (knowing where to crop images based on where the object is located), or even regular object extraction for further processing using different techniques. It can be combined with classification for not only locating the object but categorizing it into one of many possible categories.
Going one step further from object detection we would want to not only find objects inside an image, but find a pixel by pixel mask of each of the detected objects. We refer to this problem as instance or object segmentation.
Iterating over the problem of localization plus classification we end up with the need for detecting and classifying multiple objects at the same time. Object detection is the problem of finding and classifying a variable number of objects on an image. The important difference is the “variable” part. In contrast with problems like classification, the output of object detection is variable in length, since the number of objects detected may change from image to image. In this post we’ll go into the details of practical applications, what are the main issues of object detection as a machine learning problem and how the way to tackle it has been shifting in the last years with deep learning.
At Tryolabs we specialize in applying state of the art machine learning to solve business problems, so even though we love all the crazy machine learning research problems, at the end of the day we end up worrying a lot more about the applications.
Even though object detection is somewhat still of a new tool in the industry, there are already many useful and exciting applications using it.
Since the mid-2000s some point and shoot cameras started to come with the feature of detecting faces for a more efficient auto-focus. While it’s a narrower type of object detection, the methods used apply to other types of objects as we’ll describe later.