【推荐】ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解

2017 年 12 月 17 日 机器学习研究会
【推荐】ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解




点击上方 “机器学习研究会”可以订阅
摘要
 

转自:爱可可-爱生活

Convolutional neural networks are fantastic for visual recognition tasks. Good ConvNets are beasts with millions of parameters and many hidden layers. In fact, a bad rule of thumb is: ‘higher the number of hidden layers, better the network’. AlexNet, VGG, Inception, ResNet are some of the popular networks. Why do these networks work so well? How are they designed? Why do they have the structures they have? One wonders. The answer to these questions is not trivial and certainly, can’t be covered in one blog post. However, in this blog, I shall try to discuss some of these questions. Network architecture design is a complicated process and will take a while to learn and even longer to experiment designing on your own. But first, let’s put things in perspective:


Why are ConvNets beating traditional computer vision?

Image classification is the task of classifying a given image into one of the pre-defined categories. Traditional pipeline for image classification involves two modules: vizfeature extraction andclassification. 

Feature extraction involves extracting a higher level of information from raw pixel values that can capture the distinction among the categories involved. This feature extraction is done in an unsupervised manner wherein the classes of the image have nothing to do with information extracted from pixels. Some of the traditional and widely used features are GIST, HOG, SIFT, LBP etc. After the feature is extracted, a classification module is trained with the images and their associated labels. A few examples of this module are SVM, Logistic Regression, Random Forest, decision trees etc.

The problem with this pipeline is that feature extraction cannot be tweaked according to the classes and images. So if the chosen feature lacks the representation required to distinguish the categories, the accuracy of the classification model suffers a lot, irrespective of the type of classification strategy employed. A common theme among the state of the art following the traditional pipeline has been, to pick multiple feature extractors and club them inventively to get a better feature. But this involves too many heuristics as well as manual labor to tweak parameters according to the domain to reach a decent level of accuracy. By decent I mean, reaching close to human level accuracy. That’s why it took years to build a good computer vision system(like OCR, face verification, image classifiers, object detectors etc), that can work with a wide variety of data encountered during practical application, using traditional computer vision. We once produced better results using ConvNets for a company(a client of my start-up) in 6 weeks, which took them close to a year to achieve using traditional computer vision. 

Another problem with this method is that it is completely different from how we humans learn to recognize things. Just after birth, a child is incapable of perceiving his surroundings, but as he progresses and processes data, he learns to identify things. This is the philosophy behind deep learning, wherein no hard-coded feature extractor is built in. It combines the extraction and classification modules into one integrated system and it learns to extract, by discriminating representations from the images and classify them based on supervised data.

One such system is multilayer perceptrons aka neural networks which are multiple layers of neurons densely connected to each other. A deep vanilla neural network has such a large number of parameters involved that it is impossible to train such a system without overfitting the model due to the lack of a sufficient number of training examples. But with Convolutional Neural Networks(ConvNets), the task of training the whole network from the scratch can be carried out using a large dataset like ImageNet. The reason behind this is, sharing of parameters between the neurons and sparse connections in convolutional layers. It can be seen in this figure 2. In the convolution operation, the neurons in one layer are only locally connected to the input neurons and the set of parameters are shared across the 2-D feature map.

链接:

http://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/


原文链接:

https://m.weibo.cn/1402400261/4186005294906515

“完整内容”请点击【阅读原文】
↓↓↓


登录查看更多
17

相关内容

残差神经网络(ResNet)是一种人工神经网络(ANN),剩余的神经网络通过使用跳过连接跳过某些层来实现这一点。典型的ResNet模型是通过包含非线性(ReLU)和一部分双层或三重层跳跃来实现的。残差网络的特点是容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接,缓解了在深度神经网络中增加深度带来的梯度消失问题。

We'd like to share a simple tweak of Single Shot Multibox Detector (SSD) family of detectors, which is effective in reducing model size while maintaining the same quality. We share box predictors across all scales, and replace convolution between scales with max pooling. This has two advantages over vanilla SSD: (1) it avoids score miscalibration across scales; (2) the shared predictor sees the training data over all scales. Since we reduce the number of predictors to one, and trim all convolutions between them, model size is significantly smaller. We empirically show that these changes do not hurt model quality compared to vanilla SSD.

0
6
下载
预览

Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors like Faster R-CNN, R-FCN and FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically designed for the object detection. More importantly, there are several differences between the tasks of image classification and object detection. 1. Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. 2. Object detection not only needs to recognize the category of the object instances but also spatially locate the position. Large downsampling factor brings large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet~(4.8G FLOPs) backbone. The code will be released for the reproduction.

0
4
下载
预览
小贴士
相关资讯
ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解
全球人工智能
15+阅读 · 2017年12月17日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
17+阅读 · 2017年11月5日
[深度学习] AlexNet,GoogLeNet,VGG,ResNet简化版
机器学习和数学
14+阅读 · 2017年10月13日
【推荐】视频目标分割基础
机器学习研究会
8+阅读 · 2017年9月19日
【推荐】深度学习目标检测全面综述
机器学习研究会
17+阅读 · 2017年9月13日
【推荐】用Tensorflow理解LSTM
机器学习研究会
34+阅读 · 2017年9月11日
【推荐】GAN架构入门综述(资源汇总)
机器学习研究会
9+阅读 · 2017年9月3日
【推荐】全卷积语义分割综述
机器学习研究会
17+阅读 · 2017年8月31日
【推荐】TensorFlow手把手CNN实践指南
机器学习研究会
5+阅读 · 2017年8月17日
【推荐】图像分类必读开创性论文汇总
机器学习研究会
14+阅读 · 2017年8月15日
相关论文
EfficientDet: Scalable and Efficient Object Detection
Mingxing Tan,Ruoming Pang,Quoc V. Le
5+阅读 · 2019年11月20日
1D Convolutional Neural Networks and Applications: A Survey
Serkan Kiranyaz,Onur Avci,Osama Abdeljaber,Turker Ince,Moncef Gabbouj,Daniel J. Inman
4+阅读 · 2019年5月9日
Yulun Zhang,Kunpeng Li,Kai Li,Bineng Zhong,Yun Fu
8+阅读 · 2019年3月24日
Recurrent Fusion Network for Image Captioning
Wenhao Jiang,Lin Ma,Yu-Gang Jiang,Wei Liu,Tong Zhang
3+阅读 · 2018年7月31日
Pengchong Jin,Vivek Rathod,Xiangxin Zhu
6+阅读 · 2018年7月9日
Zeming Li,Chao Peng,Gang Yu,Xiangyu Zhang,Yangdong Deng,Jian Sun
4+阅读 · 2018年4月17日
Christian Rupprecht,Iro Laina,Nassir Navab,Gregory D. Hager,Federico Tombari
4+阅读 · 2018年3月30日
Zuxuan Wu,Tushar Nagarajan,Abhishek Kumar,Steven Rennie,Larry S. Davis,Kristen Grauman,Rogerio Feris
5+阅读 · 2018年3月30日
Mark Sandler,Andrew Howard,Menglong Zhu,Andrey Zhmoginov,Liang-Chieh Chen
9+阅读 · 2018年1月16日
Richard Zhang,Phillip Isola,Alexei A. Efros,Eli Shechtman,Oliver Wang
11+阅读 · 2018年1月11日
Top