在深度学习中,卷积神经网络(CNN或ConvNet)是一类深度神经网络,最常用于分析视觉图像。基于它们的共享权重架构和平移不变性特征,它们也被称为位移不变或空间不变的人工神经网络(SIANN)。它们在图像和视频识别,推荐系统,图像分类,医学图像分析,自然语言处理,和财务时间序列中都有应用。

卷积神经网络(CNN)从入门到精通——一个过来人的总结

基础入门

深度学习是一门实践科学,实验发展远远甩开了理论研究,因此本文的架构采用理论与实践相结合的模式。

粗略了解

首先可以去专知深度学习条目下看看相关文章,

针对卷积神经网络,我们可以通过如下文章了解基本概念

卷积神经网络工作原理直观的解释?https://www.zhihu.com/question/39022858

技术向:一文读懂卷积神经网络CNN http://dataunion.org/11692.html

深度学习元老Yann Lecun详解卷积神经网络https://www.leiphone.com/news/201608/zaB48AcZ1AFm1TaP.html

CNN笔记:通俗理解卷积神经网络https://www.2cto.com/kf/201607/522441.html

了解完基本概念之后,还需要对CNN有一个直观理解,深度学习可视化是一个非常不错的选择

Visualizing and Understanding Convolutional Networks中文笔记http://www.gageet.com/2014/10235.php

英文原文,感兴趣的可以看一下https://arxiv.org/abs/1311.2901

基本实践

在开始具体的实践之前,可以先去tensorflow的playground尝试一番,地址http://playground.tensorflow.org/,指导http://f.dataguru.cn/article-9324-1.html

之后就可以在自己的电脑上实验了,首先,使用GPU是必须的:

安装cudahttp://blog.csdn.net/u010480194/article/details/54287335

安装cudnnhttp://blog.csdn.net/lucifer_zzq/article/details/76675239

之后就是选择适合自己的框架

现在最火的深度学习框架是什么?https://www.zhihu.com/question/52517062?answer_deleted_redirect=true

深度 | 主流深度学习框架对比:看你最适合哪一款?http://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650719118&idx=2&sn=fad8b7cad70cc6a227f88ae07a89db66#rd

当然,还有一个专门评价框架的github项目,更新比较勤https://github.com/hunkim/DeepLearningStars

如果有选择困难症的话,不负责任地推荐两个框架:tensorflow和pytorch,tensorflow可视化和工程衔接做得很好,pytorch实现比较自由,用起来很舒服

tensorflow官网http://www.tensorflow.org/

pytorch 官网http://pytorch.org/

基本按照官网上的指示一步步地安装就没啥大问题了,如果真遇到问题,可以上一个神奇的网站https://stackoverflow.com/搜索解决方法,基本上都能找到

还需要熟悉一个重要的工具github https://github.com/,不论是自己管理代码还是借鉴别人的代码都很方便,想要教程的话可以参考这篇回答https://www.zhihu.com/question/20070065

当然,要是偷懒不想看的话,可以用IDE来辅助管理,例如pycharmhttp://www.jetbrains.com/pycharm/,教程http://blog.csdn.net/u013088062/article/details/50349833

一个可视化的交互工具也是非常重要的,这里推荐神器jupyter notebook http://python.jobbole.com/87527/?repeat=w3tc

以上准备工作都做好了,就可以开始自己的入门教程了。事实上官网的教程非常不错,但要是嫌弃全英文看着困难的话,也可以看看以下教程

tensorflow

TensorFlow 如何入门?https://www.zhihu.com/question/49909565

TensorFlow入门http://hacker.duanshishi.com/?p=1639

谷歌的官方tutorial其实挺完善的,不想看英文可以看看这个中文翻译http://wiki.jikexueyuan.com/project/tensorflow-zh/

pytorch

PyTorch深度学习:60分钟入门(Translation)https://zhuanlan.zhihu.com/p/25572330

新手如何入门pytorch?https://www.zhihu.com/question/55720139

超简单!pytorch入门教程(一):Tensorhttp://www.jianshu.com/p/5ae644748f21

如果对python不熟悉的话,可以先看看这两个教程python2:http://www.runoob.com/python/python-tutorial.html,python3:http://www.runoob.com/python3/python3-tutorial.html

如果只是玩票性质的,不想在框架上浪费太多时间的话,可以试试keras

Keras入门教程http://www.360doc.com/content/17/0624/12/1489589_666148811.shtml

进阶学习

经过了前面的入门,相信大家已经对卷积神经网络有了一个基本概念了,同时对如何实现CNN也有了基本的了解。而进阶学习的学习同样也是两个方面

理论深入

首先是反向传播算法,入门时虽然用不着看,因为常用的框架都有自动求导,但是想要进一步一定要弄清楚。教程http://blog.csdn.net/u014313009/article/details/51039334

接着熟悉一下CNN的几个经典模型

基础模型AlexNet

文章:ImageNet Classification with Deep Convolutional Neural Networkshttp://ml.informatik.uni-freiburg.de/former/_media/teaching/ws1314/dl/talk_simon_group2.pdf

讲解:http://blog.csdn.net/u014088052/article/details/50898842

代码:tensorflowhttps://github.com/kratzert/finetune_alexnet_with_tensorflow pytorchhttps://github.com/aaron-xichen/pytorch-playground

一个时代ResNet

文章:Deep Residual Learning for Image Recognitionhttps://arxiv.org/abs/1512.03385

讲解:http://blog.csdn.net/wspba/article/details/56019373

代码:tensorflowhttps://github.com/ry/tensorflow-resnet pytorchhttps://github.com/isht7/pytorch-deeplab-resnet

最近挺好用DenseNet

文章:Densely Connected Convolutional Networks https://arxiv.org/pdf/1608.06993.pdf

讲解:http://blog.csdn.net/u014380165/article/details/75142664

代码:原版https://github.com/liuzhuang13/DenseNet tensorflowhttps://github.com/YixuanLi/densenet-tensorflow pytorchhttps://github.com/bamos/densenet.pytorch

推荐先看讲解,然后阅读源码,一方面可以加深对模型的理解,另一方面也可以从别人的源码中学习各种框架新姿势。

当然,我们不能仅仅停留在表面上,这里推荐一本非常有名的书《Deep Learning》,这里是中文版的链接https://github.com/exacity/deeplearningbook-chinese

更为基础的理论研究目前还处于缺失状态

实践深入

要是有耐心的同学,可以学习一下斯坦福新开的课程https://www.bilibili.com/video/av9156347/

具体到实践中,有非常多需要学习的点。在学习之前,最好先看看调参技巧

深度学习调参有哪些技巧?https://www.zhihu.com/question/25097993

过去有本调参圣经Neural Networks: Tricks of the Trade ,太老了,不推荐看。

dropout,lrn这些过去常用的模块最近已经用得越来越少了,就不赘述了,有关正则化,推荐BatchNorm https://www.zhihu.com/question/38102762, 思想简单,效果好

虽然有了BatchNorm之后训练基本已经非常稳定了,但最好还是学习一下梯度裁剪http://blog.csdn.net/zyf19930610/article/details/71743291

激活函数也是一个非常重要的点,不过在卷积神经网络中基本无脑用ReLuhttp://www.cnblogs.com/neopenx/p/4453161.html就行了,计算快,ReLu+BatchNorm可以说是万金油。当然,像一些具体的任务还是需要具体分析,例如GAN就不适合用这种简单粗暴的激活函数。

结构上基本完善了,接下来就是优化了,优化的算法有很多,最常见的是SGD与Adam。

所有优化算法概览http://www.mamicode.com/info-detail-1931210.html

好的算法可以更快地收敛或者有更好的效果,不过大多数实验中SGD与Adam已经够用了。

大神们的经验也是要看一下的:Yoshua Bengio等大神传授:26条深度学习经验http://www.csdn.net/article/2015-09-16/2825716

细化研究

前面的这些学完之后,就是具体的研究项目了,大家可以去这个github上找自己感兴趣的论文https://github.com/terryum/awesome-deep-learning-papers,下面列举了一些和卷积神经网络相关的优秀论文。

Understanding / Generalization / Transfer

Distilling the knowledge in a neural network (2015), G. Hinton et al. http://arxiv.org/pdf/1503.02531

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (2015), A. Nguyen et al. http://arxiv.org/pdf/1412.1897

How transferable are features in deep neural networks? (2014), J. Yosinski et al.http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf

CNN features off-the-Shelf: An astounding baseline for recognition (2014), A. Razavian et al. http://www.cv-foundation.org//openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf

Learning and transferring mid-Level image representations using convolutional neural networks (2014), M. Oquab et al. http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Oquab_Learning_and_Transferring_2014_CVPR_paper.pdf

Visualizing and understanding convolutional networks (2014), M. Zeiler and R. Fergus http://arxiv.org/pdf/1311.2901

Decaf: A deep convolutional activation feature for generic visual recognition (2014), J. Donahue et al. http://arxiv.org/pdf/1310.1531

Optimization / Training Techniques

Training very deep networks (2015), R. Srivastava et al.http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf

Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy http://arxiv.org/pdf/1502.03167

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), K. He et al. http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf

Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al. http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

Adam: A method for stochastic optimization (2014), D. Kingma and J. Bahttp://arxiv.org/pdf/1412.6980

Improving neural networks by preventing co-adaptation of feature detectors (2012), G. Hinton et al. http://arxiv.org/pdf/1207.0580.pdf

Random search for hyper-parameter optimization (2012) J. Bergstra and Y. Bengiohttp://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a

Convolutional Neural Network Models

Rethinking the inception architecture for computer vision (2016), C. Szegedy et al. http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf

Inception-v4, inception-resnet and the impact of residual connections on learning (2016), C. Szegedy et al.http://arxiv.org/pdf/1602.07261

Identity Mappings in Deep Residual Networks (2016), K. He et al. https://arxiv.org/pdf/1603.05027v2.pdf

Deep residual learning for image recognition (2016), K. He et al. http://arxiv.org/pdf/1512.03385

Spatial transformer network (2015), M. Jaderberg et al., http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf

Going deeper with convolutions (2015), C. Szegedy et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

Very deep convolutional networks for large-scale image recognition (2014), K. Simonyan and A. Zisserman http://arxiv.org/pdf/1409.1556

Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al. http://arxiv.org/pdf/1405.3531

OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al.http://arxiv.org/pdf/1312.6229

Maxout networks (2013), I. Goodfellow et al. http://arxiv.org/pdf/1302.4389v4

Network in network (2013), M. Lin et al. http://arxiv.org/pdf/1312.4400

ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al.http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Image: Segmentation / Object Detection

You only look once: Unified, real-time object detection (2016), J. Redmon et al.http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf

Fully convolutional networks for semantic segmentation (2015), J. Long et al. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015), S. Ren et al.http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

Fast R-CNN (2015), R. Girshick http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf

Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al. http://arxiv.org/pdf/1406.4729

Semantic image segmentation with deep convolutional nets and fully connected CRFs, L. Chen et al. https://arxiv.org/pdf/1412.7062

Learning hierarchical features for scene labeling (2013), C. Farabet et al. https://hal-enpc.archives-ouvertes.fr/docs/00/74/20/77/PDF/farabet-pami-13.pdf

Image / Video / Etc

Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al. https://arxiv.org/pdf/1501.00092v3.pdf

A neural algorithm of artistic style (2015), L. Gatys et al. https://arxiv.org/pdf/1508.06576

Deep visual-semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei-Feihttp://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Karpathy_Deep_Visual-Semantic_Alignments_2015_CVPR_paper.pdf

Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al. http://arxiv.org/pdf/1502.03044

Show and tell: A neural image caption generator (2015), O. Vinyals et al. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vinyals_Show_and_Tell_2015_CVPR_paper.pdf

Long-term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf

VQA: Visual question answering (2015), S. Antol et al.http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf

DeepFace: Closing the gap to human-level performance in face verification (2014), Y. Taigman et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Taigman_DeepFace_Closing_the_2014_CVPR_paper.pdf

Large-scale video classification with convolutional neural networks (2014), A. Karpathy et al. http://vision.stanford.edu/pdf/karpathy14.pdf

Two-stream convolutional networks for action recognition in videos (2014), K. Simonyan et al. http://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf

3D convolutional neural networks for human action recognition (2013), S. Ji et al.http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_JiXYY10.pdf

更多的需要可以参考专知的另一篇deeplearning相关的文章http://www.zhuanzhi.ai/topic/2001228999615594/awesome,其中有很多具体细化的领域以及相关文章,这里就不重复了。

成为VIP会员查看完整内容
微信扫码咨询专知VIP会员