【推荐】深度学习目标检测全面综述

2017 年 9 月 13 日 机器学习研究会
【推荐】深度学习目标检测全面综述


点击上方 “机器学习研究会”可以订阅
摘要
 

转自:爱可可-爱生活

With the rise of autonomous vehicles, smart video surveillance, facial detection and various people counting applications, fast and accurate object detection systems are rising in demand. These systems involve not only recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. This makes object detection a significantly harder task than its traditional computer vision predecessor, image classification.

Fortunately, however, the most successful approaches to object detection are currently extensions of image classification models. A few months ago, Google released a new object detection API for Tensorflow. With this release came the pre-built architectures and weights for a few specific models:

  • Single Shot Multibox Detector (SSD) with MobileNets

  • SSD with Inception V2

  • Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101

  • Faster RCNN with Resnet 101

  • Faster RCNN with Inception Resnet v2

In my last blog post, I covered the intuition behind the three base network architectures listed above: MobileNets, Inception, and ResNet. This time around, I want to do the same for Tensorflow’s object detection models: Faster R-CNN, R-FCN, and SSD. By the end of this post, we will hopefully have gained an understanding of how deep learning is applied to object detection, and how these object detection models both inspire and diverge from one another.

Faster R-CNN

Faster R-CNN is now a canonical model for deep learning-based object detection. It helped inspire many detection and segmentation models that came after it, including the two others we’re going to examine today. Unfortunately, we can’t really begin to understand Faster R-CNN without understanding its own predecessors, R-CNN and Fast R-CNN, so let’s take a quick dive into its ancestry.

R-CNN

R-CNN is the grand-daddy of Faster R-CNN. In other words, R-CNN reallykicked things off.

R-CNN, or Region-based Convolutional Neural Network, consisted of 3 simple steps:

  1. Scan the input image for possible objects using an algorithm called Selective Search, generating ~2000 region proposals

  2. Run a convolutional neural net (CNN) on top of each of these region proposals

  3. Take the output of each CNN and feed it into a) an SVM to classify the region and b) a linear regressor to tighten the bounding box of the object, if such an object exists.

These 3 steps are illustrated in the image below:

In other words, we first propose regions, then extract features, and then classify those regions based on their features. In essence, we have turned object detection into an image classification problem. R-CNN was very intuitive, but very slow.

Fast R-CNN

R-CNN’s immediate descendant was Fast-R-CNN. Fast R-CNN resembled the original in many ways, but improved on its detection speed through two main augmentations:

  1. Performing feature extraction over the image before proposing regions, thus only running one CNN over the entire image instead of 2000 CNN’s over 2000 overlapping regions

  2. Replacing the SVM with a softmax layer, thus extending the neural network for predictions instead of creating a new model

The new model looked something like this:


链接(需翻墙):

https://medium.com/towards-data-science/deep-learning-for-object-detection-a-comprehensive-review-73930816d8d9


原文链接:

https://m.weibo.cn/1402400261/4151522189164931

“完整内容”请点击【阅读原文】
↓↓↓
登录查看更多
17

相关内容

目标检测,也叫目标提取,是一种与计算机视觉和图像处理有关的计算机技术,用于检测数字图像和视频中特定类别的语义对象(例如人,建筑物或汽车)的实例。深入研究的对象检测领域包括面部检测和行人检测。 对象检测在计算机视觉的许多领域都有应用,包括图像检索和视频监视。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.

0
5
下载
预览
小贴士
相关资讯
【推荐】深度学习情感分析综述
机器学习研究会
51+阅读 · 2018年1月26日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
16+阅读 · 2017年11月5日
【推荐】树莓派/OpenCV/dlib人脸定位/瞌睡检测
机器学习研究会
9+阅读 · 2017年10月24日
斯坦福:「目标检测」深度学习全面指南
人工智能学家
7+阅读 · 2017年10月11日
【推荐】MXNet深度情感分析实战
机器学习研究会
16+阅读 · 2017年10月4日
【推荐】GAN架构入门综述(资源汇总)
机器学习研究会
8+阅读 · 2017年9月3日
【推荐】深度学习目标检测概览
机器学习研究会
9+阅读 · 2017年9月1日
【推荐】全卷积语义分割综述
机器学习研究会
17+阅读 · 2017年8月31日
【推荐】TensorFlow手把手CNN实践指南
机器学习研究会
5+阅读 · 2017年8月17日
相关VIP内容
专知会员服务
84+阅读 · 2020年6月26日
专知会员服务
97+阅读 · 2020年4月21日
【开源书】PyTorch深度学习起步,零基础入门(附pdf下载)
专知会员服务
71+阅读 · 2019年10月26日
Keras François Chollet 《Deep Learning with Python 》, 386页pdf
专知会员服务
58+阅读 · 2019年10月12日
开源书:PyTorch深度学习起步
专知会员服务
24+阅读 · 2019年10月11日
强化学习最新教程,17页pdf
专知会员服务
62+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
34+阅读 · 2019年10月10日
相关论文
Improving CNN-based Planar Object Detection with Geometric Prior Knowledge
Jianxiong Cai,Hongyu Chen,Laurent Kneip,Sören Schwertfeger
6+阅读 · 2019年9月23日
Object Detection in 20 Years: A Survey
Zhengxia Zou,Zhenwei Shi,Yuhong Guo,Jieping Ye
38+阅读 · 2019年5月13日
Applying Faster R-CNN for Object Detection on Malaria Images
Jane Hung,Deepali Ravel,Stefanie C. P. Lopes,Gabriel Rangel,Odailton Amaral Nery,Benoit Malleret,Francois Nosten,Marcus V. G. Lacerda,Marcelo U. Ferreira,Laurent Rénia,Manoj T. Duraisingh,Fabio T. M. Costa,Matthias Marti,Anne E. Carpenter
4+阅读 · 2019年3月11日
Wenhui Zhang,Tejas Mahale
3+阅读 · 2018年12月13日
Fast and Accurate, Convolutional Neural Network Based Approach for Object Detection from UAV
Xiaoliang Wang,Peng Cheng,Xinchuan Liu,Benedict Uzochukwu
5+阅读 · 2018年8月16日
Xiaowei Hu,Xuemiao Xu,Yongjie Xiao,Hao Chen,Shengfeng He,Jing Qin,Pheng-Ann Heng
9+阅读 · 2018年5月16日
Zeming Li,Chao Peng,Gang Yu,Xiangyu Zhang,Yangdong Deng,Jian Sun
4+阅读 · 2018年4月17日
Changzheng Zhang,Xiang Xu,Dandan Tu
5+阅读 · 2018年2月6日
Mohammad Javad Shafiee,Brendan Chywl,Francis Li,Alexander Wong
5+阅读 · 2017年9月18日
Wei Liu,Dragomir Anguelov,Dumitru Erhan,Christian Szegedy,Scott Reed,Cheng-Yang Fu,Alexander C. Berg
4+阅读 · 2016年12月29日
Top