Object detection serves as a significant step in improving performance of complex downstream computer vision tasks. It has been extensively studied for many years now and current state-of-the-art 2D object detection techniques proffer superlative results even in complex images. In this chapter, we discuss the geometry-based pioneering works in object detection, followed by the recent breakthroughs that employ deep learning. Some of these use a monolithic architecture that takes a RGB image as input and passes it to a feed-forward ConvNet or vision Transformer. These methods, thereby predict class-probability and bounding-box coordinates, all in a single unified pipeline. Two-stage architectures on the other hand, first generate region proposals and then feed it to a CNN to extract features and predict object category and bounding-box. We also elaborate upon the applications of object detection in video event recognition, to achieve better fine-grained video classification performance. Further, we highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
翻译:物体探测是改进复杂下游计算机视觉任务业绩的一个重大步骤,多年来已经广泛研究过,目前先进的2D物体探测技术甚至在复杂的图像中也具有超值效果。本章讨论物体探测方面的几何开拓性工作,随后是利用深层学习的最近突破。其中一些利用了将RGB图像作为输入输入的单一结构,将其传送到一个Feed-for-ConvNet或视觉变异器。这些方法,从而预测了等级概率和捆绑式框坐标,全部在单一的统一管道中。两阶段的建筑,首先产生区域建议,然后将其反馈给CNN,以提取特征和预测物体类别及捆绑式框。我们还详细介绍了在视频事件识别中应用物体探测技术的情况,以取得更好的精细的视频分类性能。此外,我们着重介绍了图像和视频中用于2D对象探测的最新数据集,并介绍了各种状态物体探测技术的比较性能摘要。