深度学习是一门实践科学,实验发展远远甩开了理论研究,因此本文的架构采用理论与实践相结合的模式。
首先可以去专知深度学习条目下看看相关文章,
针对卷积神经网络,我们可以通过如下文章了解基本概念
卷积神经网络工作原理直观的解释?https://www.zhihu.com/question/39022858
技术向:一文读懂卷积神经网络CNN http://dataunion.org/11692.html
深度学习元老Yann Lecun详解卷积神经网络https://www.leiphone.com/news/201608/zaB48AcZ1AFm1TaP.html
CNN笔记:通俗理解卷积神经网络https://www.2cto.com/kf/201607/522441.html
了解完基本概念之后,还需要对CNN有一个直观理解,深度学习可视化是一个非常不错的选择
Visualizing and Understanding Convolutional Networks中文笔记http://www.gageet.com/2014/10235.php
英文原文,感兴趣的可以看一下https://arxiv.org/abs/1311.2901
在开始具体的实践之前,可以先去tensorflow的playground尝试一番,地址http://playground.tensorflow.org/,指导http://f.dataguru.cn/article-9324-1.html
之后就可以在自己的电脑上实验了,首先,使用GPU是必须的:
安装cudahttp://blog.csdn.net/u010480194/article/details/54287335
安装cudnnhttp://blog.csdn.net/lucifer_zzq/article/details/76675239
之后就是选择适合自己的框架
现在最火的深度学习框架是什么?https://www.zhihu.com/question/52517062?answer_deleted_redirect=true
深度 | 主流深度学习框架对比:看你最适合哪一款?http://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650719118&idx=2&sn=fad8b7cad70cc6a227f88ae07a89db66#rd
当然,还有一个专门评价框架的github项目,更新比较勤https://github.com/hunkim/DeepLearningStars
如果有选择困难症的话,不负责任地推荐两个框架:tensorflow和pytorch,tensorflow可视化和工程衔接做得很好,pytorch实现比较自由,用起来很舒服
tensorflow官网http://www.tensorflow.org/
pytorch 官网http://pytorch.org/
基本按照官网上的指示一步步地安装就没啥大问题了,如果真遇到问题,可以上一个神奇的网站https://stackoverflow.com/搜索解决方法,基本上都能找到
还需要熟悉一个重要的工具github https://github.com/,不论是自己管理代码还是借鉴别人的代码都很方便,想要教程的话可以参考这篇回答https://www.zhihu.com/question/20070065
当然,要是偷懒不想看的话,可以用IDE来辅助管理,例如pycharmhttp://www.jetbrains.com/pycharm/,教程http://blog.csdn.net/u013088062/article/details/50349833
一个可视化的交互工具也是非常重要的,这里推荐神器jupyter notebook http://python.jobbole.com/87527/?repeat=w3tc
以上准备工作都做好了,就可以开始自己的入门教程了。事实上官网的教程非常不错,但要是嫌弃全英文看着困难的话,也可以看看以下教程
tensorflow
TensorFlow 如何入门?https://www.zhihu.com/question/49909565
TensorFlow入门http://hacker.duanshishi.com/?p=1639
谷歌的官方tutorial其实挺完善的,不想看英文可以看看这个中文翻译http://wiki.jikexueyuan.com/project/tensorflow-zh/
pytorch
PyTorch深度学习:60分钟入门(Translation)https://zhuanlan.zhihu.com/p/25572330
新手如何入门pytorch?https://www.zhihu.com/question/55720139
超简单!pytorch入门教程(一):Tensorhttp://www.jianshu.com/p/5ae644748f21
如果对python不熟悉的话,可以先看看这两个教程python2:http://www.runoob.com/python/python-tutorial.html,python3:http://www.runoob.com/python3/python3-tutorial.html
如果只是玩票性质的,不想在框架上浪费太多时间的话,可以试试keras
Keras入门教程http://www.360doc.com/content/17/0624/12/1489589_666148811.shtml
经过了前面的入门,相信大家已经对卷积神经网络有了一个基本概念了,同时对如何实现CNN也有了基本的了解。而进阶学习的学习同样也是两个方面
首先是反向传播算法,入门时虽然用不着看,因为常用的框架都有自动求导,但是想要进一步一定要弄清楚。教程http://blog.csdn.net/u014313009/article/details/51039334
接着熟悉一下CNN的几个经典模型
文章:ImageNet Classification with Deep Convolutional Neural Networkshttp://ml.informatik.uni-freiburg.de/former/_media/teaching/ws1314/dl/talk_simon_group2.pdf
讲解:http://blog.csdn.net/u014088052/article/details/50898842
代码:tensorflowhttps://github.com/kratzert/finetune_alexnet_with_tensorflow pytorchhttps://github.com/aaron-xichen/pytorch-playground
文章:Deep Residual Learning for Image Recognitionhttps://arxiv.org/abs/1512.03385
讲解:http://blog.csdn.net/wspba/article/details/56019373
代码:tensorflowhttps://github.com/ry/tensorflow-resnet pytorchhttps://github.com/isht7/pytorch-deeplab-resnet
文章:Densely Connected Convolutional Networks https://arxiv.org/pdf/1608.06993.pdf
讲解:http://blog.csdn.net/u014380165/article/details/75142664
代码:原版https://github.com/liuzhuang13/DenseNet tensorflowhttps://github.com/YixuanLi/densenet-tensorflow pytorchhttps://github.com/bamos/densenet.pytorch
推荐先看讲解,然后阅读源码,一方面可以加深对模型的理解,另一方面也可以从别人的源码中学习各种框架新姿势。
当然,我们不能仅仅停留在表面上,这里推荐一本非常有名的书《Deep Learning》,这里是中文版的链接https://github.com/exacity/deeplearningbook-chinese
更为基础的理论研究目前还处于缺失状态
要是有耐心的同学,可以学习一下斯坦福新开的课程https://www.bilibili.com/video/av9156347/
具体到实践中,有非常多需要学习的点。在学习之前,最好先看看调参技巧
深度学习调参有哪些技巧?https://www.zhihu.com/question/25097993
过去有本调参圣经Neural Networks: Tricks of the Trade ,太老了,不推荐看。
dropout,lrn这些过去常用的模块最近已经用得越来越少了,就不赘述了,有关正则化,推荐BatchNorm https://www.zhihu.com/question/38102762, 思想简单,效果好
虽然有了BatchNorm之后训练基本已经非常稳定了,但最好还是学习一下梯度裁剪http://blog.csdn.net/zyf19930610/article/details/71743291
激活函数也是一个非常重要的点,不过在卷积神经网络中基本无脑用ReLuhttp://www.cnblogs.com/neopenx/p/4453161.html就行了,计算快,ReLu+BatchNorm可以说是万金油。当然,像一些具体的任务还是需要具体分析,例如GAN就不适合用这种简单粗暴的激活函数。
结构上基本完善了,接下来就是优化了,优化的算法有很多,最常见的是SGD与Adam。
所有优化算法概览http://www.mamicode.com/info-detail-1931210.html
好的算法可以更快地收敛或者有更好的效果,不过大多数实验中SGD与Adam已经够用了。
大神们的经验也是要看一下的:Yoshua Bengio等大神传授:26条深度学习经验http://www.csdn.net/article/2015-09-16/2825716
前面的这些学完之后,就是具体的研究项目了,大家可以去这个github上找自己感兴趣的论文https://github.com/terryum/awesome-deep-learning-papers,下面列举了一些和卷积神经网络相关的优秀论文。
Distilling the knowledge in a neural network (2015), G. Hinton et al. http://arxiv.org/pdf/1503.02531
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (2015), A. Nguyen et al. http://arxiv.org/pdf/1412.1897
How transferable are features in deep neural networks? (2014), J. Yosinski et al.http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf
CNN features off-the-Shelf: An astounding baseline for recognition (2014), A. Razavian et al. http://www.cv-foundation.org//openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf
Learning and transferring mid-Level image representations using convolutional neural networks (2014), M. Oquab et al. http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Oquab_Learning_and_Transferring_2014_CVPR_paper.pdf
Visualizing and understanding convolutional networks (2014), M. Zeiler and R. Fergus http://arxiv.org/pdf/1311.2901
Decaf: A deep convolutional activation feature for generic visual recognition (2014), J. Donahue et al. http://arxiv.org/pdf/1310.1531
Training very deep networks (2015), R. Srivastava et al.http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf
Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy http://arxiv.org/pdf/1502.03167
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), K. He et al. http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf
Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al. http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
Adam: A method for stochastic optimization (2014), D. Kingma and J. Bahttp://arxiv.org/pdf/1412.6980
Improving neural networks by preventing co-adaptation of feature detectors (2012), G. Hinton et al. http://arxiv.org/pdf/1207.0580.pdf
Random search for hyper-parameter optimization (2012) J. Bergstra and Y. Bengiohttp://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a
Rethinking the inception architecture for computer vision (2016), C. Szegedy et al. http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf
Inception-v4, inception-resnet and the impact of residual connections on learning (2016), C. Szegedy et al.http://arxiv.org/pdf/1602.07261
Identity Mappings in Deep Residual Networks (2016), K. He et al. https://arxiv.org/pdf/1603.05027v2.pdf
Deep residual learning for image recognition (2016), K. He et al. http://arxiv.org/pdf/1512.03385
Spatial transformer network (2015), M. Jaderberg et al., http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf
Going deeper with convolutions (2015), C. Szegedy et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
Very deep convolutional networks for large-scale image recognition (2014), K. Simonyan and A. Zisserman http://arxiv.org/pdf/1409.1556
Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al. http://arxiv.org/pdf/1405.3531
OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al.http://arxiv.org/pdf/1312.6229
Maxout networks (2013), I. Goodfellow et al. http://arxiv.org/pdf/1302.4389v4
Network in network (2013), M. Lin et al. http://arxiv.org/pdf/1312.4400
ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al.http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
You only look once: Unified, real-time object detection (2016), J. Redmon et al.http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
Fully convolutional networks for semantic segmentation (2015), J. Long et al. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015), S. Ren et al.http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf
Fast R-CNN (2015), R. Girshick http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf
Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf
Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al. http://arxiv.org/pdf/1406.4729
Semantic image segmentation with deep convolutional nets and fully connected CRFs, L. Chen et al. https://arxiv.org/pdf/1412.7062
Learning hierarchical features for scene labeling (2013), C. Farabet et al. https://hal-enpc.archives-ouvertes.fr/docs/00/74/20/77/PDF/farabet-pami-13.pdf
Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al. https://arxiv.org/pdf/1501.00092v3.pdf
A neural algorithm of artistic style (2015), L. Gatys et al. https://arxiv.org/pdf/1508.06576
Deep visual-semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei-Feihttp://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Karpathy_Deep_Visual-Semantic_Alignments_2015_CVPR_paper.pdf
Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al. http://arxiv.org/pdf/1502.03044
Show and tell: A neural image caption generator (2015), O. Vinyals et al. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vinyals_Show_and_Tell_2015_CVPR_paper.pdf
Long-term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf
VQA: Visual question answering (2015), S. Antol et al.http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf
DeepFace: Closing the gap to human-level performance in face verification (2014), Y. Taigman et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Taigman_DeepFace_Closing_the_2014_CVPR_paper.pdf
Large-scale video classification with convolutional neural networks (2014), A. Karpathy et al. http://vision.stanford.edu/pdf/karpathy14.pdf
Two-stream convolutional networks for action recognition in videos (2014), K. Simonyan et al. http://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf
3D convolutional neural networks for human action recognition (2013), S. Ji et al.http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_JiXYY10.pdf
更多的需要可以参考专知的另一篇deeplearning相关的文章http://www.zhuanzhi.ai/topic/2001228999615594/awesome,其中有很多具体细化的领域以及相关文章,这里就不重复了。
本文研究了卷积神经网络(CNN)和视觉语言预训练Transformer(VLPT)的联合学习,旨在从数百万个图像-文本对中学习跨模态对齐。当前大多数文章都是先抽取出图像中的显著性区域,再将其与文字一一对齐。由于基于区域的视觉特征通常代表图像的一部分,因此现有的视觉语言模型要充分理解配对自然语言的语义是一项挑战。由于基于区域的视觉特征通常代表图像的一部分,现有的视觉语言模型很难完全理解成对自然语言的语义。本文提出SOHO“开箱即看”的概念,将完整的图像为输入,以一种端到端的方式学习视觉语言表达。SOHO不需要边界框标注,这使得推理速度比基于区域的方法快10倍。特别地,SOHO学会了通过视觉词典(VD)来提取全面而紧凑的图像特征,这有助于跨模态理解。大量的实验结果也验证了本文SOHO的有效性。
https://www.zhuanzhi.ai/paper/a8c52c4b641c0a5bc840a955b6258b39
Partial voluming (PV) is arguably the last crucial unsolved problem in Bayesian segmentation of brain MRI with probabilistic atlases. PV occurs when voxels contain multiple tissue classes, giving rise to image intensities that may not be representative of any one of the underlying classes. PV is particularly problematic for segmentation when there is a large resolution gap between the atlas and the test scan, e.g., when segmenting clinical scans with thick slices, or when using a high-resolution atlas. In this work, we present PV-SynthSeg, a convolutional neural network (CNN) that tackles this problem by directly learning a mapping between (possibly multi-modal) low resolution (LR) scans and underlying high resolution (HR) segmentations. PV-SynthSeg simulates LR images from HR label maps with a generative model of PV, and can be trained to segment scans of any desired target contrast and resolution, even for previously unseen modalities where neither images nor segmentations are available at training. PV-SynthSeg does not require any preprocessing, and runs in seconds. We demonstrate the accuracy and flexibility of the method with extensive experiments on three datasets and 2,680 scans. The code is available at https://github.com/BBillot/SynthSeg.
Partial voluming (PV) is arguably the last crucial unsolved problem in Bayesian segmentation of brain MRI with probabilistic atlases. PV occurs when voxels contain multiple tissue classes, giving rise to image intensities that may not be representative of any one of the underlying classes. PV is particularly problematic for segmentation when there is a large resolution gap between the atlas and the test scan, e.g., when segmenting clinical scans with thick slices, or when using a high-resolution atlas. In this work, we present PV-SynthSeg, a convolutional neural network (CNN) that tackles this problem by directly learning a mapping between (possibly multi-modal) low resolution (LR) scans and underlying high resolution (HR) segmentations. PV-SynthSeg simulates LR images from HR label maps with a generative model of PV, and can be trained to segment scans of any desired target contrast and resolution, even for previously unseen modalities where neither images nor segmentations are available at training. PV-SynthSeg does not require any preprocessing, and runs in seconds. We demonstrate the accuracy and flexibility of the method with extensive experiments on three datasets and 2,680 scans. The code is available at https://github.com/BBillot/SynthSeg.