推荐|Andrew Ng计算机视觉教程总结

2017 年 11 月 23 日 全球人工智能


——免费加入AI技术专家社群>>

——免费加入AI高管投资者群>>

——日薪5k-10k招兼职AI讲师>>

Computer Vision by Andrew Ng—11 Lessons Learned 

I recently completed Andrew Ng’s computer vision course on Coursera. Ng does an excellent job at explaining many of the complex ideas required to optimize any computer vision task. My favourite component of the course was the neural style transfer section (see lesson 11), which allows you to create artwork which combines the style of Claud Monet with the content of whichever image you would like. This is an example of what you can do: 

In this article, I will discuss 11 key lessons that I learned in the course. Note that this is the fourth course in the Deep Learning specialization released by deeplearning.ai. If you would like to learn about the previous 3 courses, I recommend you check out this blog. 


Lesson 1: Why computer vision is taking o? 

Big data and algorithmic developments will cause the testing error of intelligent systems to converge to Bayes optimal error. This will lead to Created in week 4 of the course. Combined Ng’s face with the style of Rain Princess by Leonid Afremov. AI surpassing human level performance in all areas, including natural perception tasks. Open source software from TensorFlow allows you to use transfer learning to implement an object detection system for any object very rapidly. With transfer learning, you only need about 100– 500 examples for the system to work relatively well. Manually labeling 100 examples isn’t too much work, so you’ll have a minimum viable product very quickly. 


Lesson 2: How convolution works? 

Ng explains how to implement the convolution operator and shows how it can detect edges in an image. He also explains other lters, such as the Sobel lter, which put more weight on central pixels of the edge. Ng then explains that the weights of the lter should not be handdesigned but rather should be learned using a hill climbing algorithm such as gradient descent. Lesson 3: Why convolutions? Ng gives several philosophical reasons for why convolutions work so well in image recognition tasks. He outlines 2 concrete reasons. The rst is known as parameter sharing. It is the idea that a feature detector that’s useful in one part of an image is probably useful in another part of the image. For example, an edge detector is probably useful is many parts of the image. The sharing of parameters allows the number of parameters to be small and also allows for robust translation invariance. Translation invariance is the notion that a cat shifted and rotated is still a picture of a cat. 


The second idea he outlines is known as sparsity of connections. This is the idea that each output layer is only a function of a small number of inputs (particularly, the lter size squared). This greatly reduces the number of parameters in the network and allows for faster training. 


Lesson 3: Why Padding? 

Padding is usually used to preserve the input size (i.e. the dimension of the input and output are the same). It is also used so that frames near the edges of image contribute as much to the output as frames near near the centre. 


Lesson 4: Why Max Pooling? 

Through empirical research, max pooling has proven to be extremely eective in CNN’s. By downsampling the image, we reduce the number of parameters which makes the features invariant to scale or orientation changes. 


Lesson 5: Classical network architectures 

Ng shows 3 classical network architectures including LeNet-5, AlexNet and VGG-16. The main idea he presents is that eective networks often have layers with an increasing channel size and decreasing width and height. 


Lesson 6: Why ResNets works? 

For a plain network, the training error does not monotonically decrease as the number of layers increases due to vanishing and exploding gradients. These networks have feed forward skipped connections which allow you train extremely large networks without a drop in performance. 


Lesson 7: Use Transfer Learning! 

Training large networks, such as inception, from scratch can take weeks on a GPU. You should download the weights from a pretrained network and just retrain the last softmax layer (or the last few layers). This will greatly reduce training time. The reason this works is that earlier layers tend to be associated with concepts in all images such as edges and curvy lines. 


Lesson 8: How to win computer vision competitions 

Ng explains that you should train several networks independently and average their outputs to get better performance. Data augmentation techniques such as randomly cropping images, ipping images about the horizontal and vertical axes may also help with performance. Finally, you should use an open source implementation and pretrained model to start and then ne-tune the parameters for your particular application. 


Lesson 9: How to implement object detection 

Ng starts by explaining the idea of landmark detection in an image. Basically, these landmarks become apart of your training output examples. With some clever convolution manipulations, you get an output volume that tells you the probability that the object is in a certain region and the location of the object. He also explains how to evaluate the eectiveness of your object detection algorithm using the intersection over union formula. Finally, Ng puts all these components together to explain the famous YOLO algorithm. 


Lesson 10: How to implement Face Recognition 

Facial recognition is a one-shot learning problem since you may only have one example image to identify the person. The solution is to learn a similarity function which gives the degree of dierence between two images. So if the images are of the same person, you want the function to output a small number, and vice versa for dierent people. 


The rst solution Ng gives is known as a siamese network. The idea is to input two persons into the same network separately and then compare their outputs. If the outputs are similar, then the persons are probably the same. The network is trained so that if two input images are of the same person, then the distance between their encodings is relatively small. 


The second solution he gives uses a triplet loss method. The idea is that you have a triplet of images (Anchor (A), Positive (P) and Negative (N)) and you train the network so that the output distance between A and P is much smaller than the distance between A and N. 

Lesson 11: How to create artwork using Neural Style Transfer 

Ng explains how to generate an image with a combining content and style. See the examples below. 

The key to Neural Style Transfer is to understand the visual representations for what each layer in a convolutional network is learning. It turns out that earlier layers learn simple features like edges and later features learn complex objects like faces, feet and cars. 


To build a neural style transfer image, you simply dene a cost function which is a convex combination of the similarity in content and style. In particular, the cost function would be: 


J(G) = alpha * J_content(C,G) + beta * J_style(S,G) 


where G is the generated image, C is the content image and S is the style image. The learning algorithm simply uses gradient descent to minimize the cost function with respect to the generated image, G. 


The steps are as follows: 

  1. Generate G randomly. 

  2. Use gradient descent to minimize J(G), i.e. write G := GdG(J(G)). 

  3. Repeat step 2.


Conclusion 

By completing this course, you will gain an intuitive understanding of a large chunk of the computer vision literature. The homework 1. 2. 3. assignments also give you practice implementing these ideas yourself. You will not become an expert in computer vision after completing this course, but this course may kickstart a potential idea/career you may have in computer vision. 


If you have any interesting applications of computer vision you would like to share, let me know in the comments below. I would be happy to discuss potential collaboration on new projects. 


That’s all folks—if you’ve made it this far, please comment below and add me on LinkedIn. https://www.linkedin.com/in/ryanshrott/


Github :https://github.com/ryanshrott


热门文章推荐

招聘|AI学院长期招聘AI课程讲师(兼职):日薪5k-10k

浙大才女:用人耳听不到的超声波(攻击)控制语音助手!

Science:最新发现哈希可能是大脑的通用计算原理!

厉害|波士顿动力新版人形机器人Atlas,后空翻很完美!

吴恩达:AI论文已经够多了,赶紧“搞点事”吧!

厉害了!吉利宣布收购飞行汽车公司Terrafugia !

周志华:实验表明gcForest是最好的非深度神经网络方法

黑科技|Adobe出图象技术神器!视频也可以PS了!!

史上第一个被授予公民身份的机器人索菲亚和人对答如流!

浙大90后女黑客在GeekPwn2017上秒破人脸识别系统!

周志华点评AlphaGo Zero:这6大特点非常值得注意!

登录查看更多
3

相关内容

深度强化学习策略梯度教程,53页ppt
专知会员服务
178+阅读 · 2020年2月1日
【深度学习视频分析/多模态学习资源大列表】
专知会员服务
91+阅读 · 2019年10月16日
开源书:PyTorch深度学习起步
专知会员服务
50+阅读 · 2019年10月11日
强化学习最新教程,17页pdf
专知会员服务
174+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
机器学习入门的经验与建议
专知会员服务
92+阅读 · 2019年10月10日
计算机视觉最佳实践、代码示例和相关文档
专知会员服务
17+阅读 · 2019年10月9日
TensorFlow 2.0 学习资源汇总
专知会员服务
66+阅读 · 2019年10月9日
Python机器学习教程资料/代码
机器学习研究会
8+阅读 · 2018年2月22日
计算机视觉近一年进展综述
机器学习研究会
9+阅读 · 2017年11月25日
【推荐】用Python/OpenCV实现增强现实
机器学习研究会
15+阅读 · 2017年11月16日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】视频目标分割基础
机器学习研究会
9+阅读 · 2017年9月19日
【推荐】深度学习目标检测全面综述
机器学习研究会
21+阅读 · 2017年9月13日
【推荐】GAN架构入门综述(资源汇总)
机器学习研究会
10+阅读 · 2017年9月3日
【推荐】深度学习目标检测概览
机器学习研究会
10+阅读 · 2017年9月1日
【推荐】SVM实例教程
机器学习研究会
17+阅读 · 2017年8月26日
【推荐】TensorFlow手把手CNN实践指南
机器学习研究会
5+阅读 · 2017年8月17日
The Measure of Intelligence
Arxiv
6+阅读 · 2019年11月5日
Object Detection in 20 Years: A Survey
Arxiv
48+阅读 · 2019年5月13日
Generalization and Regularization in DQN
Arxiv
6+阅读 · 2019年1月30日
Arxiv
3+阅读 · 2017年12月14日
VIP会员
相关VIP内容
深度强化学习策略梯度教程,53页ppt
专知会员服务
178+阅读 · 2020年2月1日
【深度学习视频分析/多模态学习资源大列表】
专知会员服务
91+阅读 · 2019年10月16日
开源书:PyTorch深度学习起步
专知会员服务
50+阅读 · 2019年10月11日
强化学习最新教程,17页pdf
专知会员服务
174+阅读 · 2019年10月11日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
机器学习入门的经验与建议
专知会员服务
92+阅读 · 2019年10月10日
计算机视觉最佳实践、代码示例和相关文档
专知会员服务
17+阅读 · 2019年10月9日
TensorFlow 2.0 学习资源汇总
专知会员服务
66+阅读 · 2019年10月9日
相关资讯
Python机器学习教程资料/代码
机器学习研究会
8+阅读 · 2018年2月22日
计算机视觉近一年进展综述
机器学习研究会
9+阅读 · 2017年11月25日
【推荐】用Python/OpenCV实现增强现实
机器学习研究会
15+阅读 · 2017年11月16日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】视频目标分割基础
机器学习研究会
9+阅读 · 2017年9月19日
【推荐】深度学习目标检测全面综述
机器学习研究会
21+阅读 · 2017年9月13日
【推荐】GAN架构入门综述(资源汇总)
机器学习研究会
10+阅读 · 2017年9月3日
【推荐】深度学习目标检测概览
机器学习研究会
10+阅读 · 2017年9月1日
【推荐】SVM实例教程
机器学习研究会
17+阅读 · 2017年8月26日
【推荐】TensorFlow手把手CNN实践指南
机器学习研究会
5+阅读 · 2017年8月17日
Top
微信扫码咨询专知VIP会员