【推荐】神经网络调试经验汇编:神经网络不好使该咋办?

2017 年 9 月 5 日 机器学习研究会


点击上方 “机器学习研究会”可以订阅
摘要
 

转自:爱可可-爱生活

So you're developing the next great breakthrough in deep learning but you've hit an unfortunate setback: your neural network isn't working and you have no idea what to do. You go to your boss/supervisor but they don't know either - they are just as new to all of this as you - so what now? 

Well luckily for you I'm here with a list of all the things you've probably done wrong and compiled from my own experiences implementing neural networks and supervising other students with their projects:

  1. You Forgot to Normalize Your Data

  2. You Forgot to Check your Results

  3. You Forgot to Preprocess Your Data

  4. You Forgot to use any Regularization

  5. You Used a too Large Batch Size

  6. You Used an Incorrect Learning Rate

  7. You Used the Wrong Activation Function on the Final Layer

  8. Your Network contains Bad Gradients

  9. You Initialized your Network Weights Incorrectly

  10. You Used a Network that was too Deep

  11. You Used the Wrong Number of Hidden Units


You Forgot to Normalize Your Data

What?

When using neural networks it is essential to think exactly how you are going to normalize your data. This is a non-negotiable step - there is very little chance of your network working at all without doing this correctly and with some care. Since this step is so essential and so well known in the deep learning community it is very rarely mentioned in papers and so almost always trips up beginners.

How?

In general normalization means this - subtract the mean from your data and divide your data by it's standard deviation. Usually this is done individually for each input and output feature but you may often want to do it for groups of features or to treat the normalization of some features with special care.

Why?

The primary reason we need to normalize our data is that most parts of a neural network pipeline assume that both the input and output data are distributed with a standard deviation of around one and a mean of roughly zero. These assumptions appear everywhere in deep learning literature, from weight initialization, to activation functions, to the optimization algorithms which train the network.

And?

An untrained neural network will typically output values in roughly in the range -1 to 1. If you are expecting it to output values in some other range, (for example RGB images which are stored as bytes are in the range 0 to 255) you are going to have some problems. When starting training the network will be hugely unstable as it will be producing values of -1 or 1 when values like 255 are expected - an error which is considered huge by most optimization algorithms used to train neural networks. This will produce huge gradients and likely your training error will explode. If somehow your training does not explode then the first few stages of the training will still be a waste as the first thing the network will learn is to scale and shift the output values into roughly the desired range. If you normalize your data (in this case you could simply divide by 128 and subtract 1) then none of this will be an issue.

In general, the scale of features in the neural network will also govern their importance. If you have a feature in the output with a large scale then it will generate a larger error compared to other features. Similarly, large scale features in the input will dominate the network and cause larger changes downstream. For this reason it isn't always enough to use the automatic normalization of many neural network libraries which blindly subtract the mean and divide by the standard deviation on a per-feature basis. You may have an input feature which typically ranges between 0.0 and 0.001 - is the range of this feature so small because it is an unimportant feature (in which case perhaps you don't want to re-scale it), or because it has some small unit in comparison to other features (in which case you do)? Similarly, be careful with features that have such a small range that their standard deviation becomes close to, or exactly, zero - these will produce instabilities of NaNs if you normalize them. It is important to think carefully about these issues - think about what each of your features really represent and consider normalization as the process of making the "units" of all the input features equal. This is one of the few aspects of Deep Learning where I believe a human is really required in the loop.


You Forgot to Check your Results

What?

You've trained your network for a few epochs and you can see the error going down - success! Does this mean you've done it? PhD awarded? Unfortunately not - it is almost certain there is still something wrong with your code. It could be a bug in the data pre-processing, the training code, or even the inference. just because the error goes down doesn't mean your network is learning anything useful.

How?

Checking your data looks correct at each stage of the pipeline is incredibly important. Usually this means finding some way to visualize the results. If you have image data then it is easy - animation data can also be visualized without too much trouble. If you have something more exotic you must find a way to sanity check it to make sure it looks correct at each stage of your pre-processing, training, and inference pipeline and compare it to ground truth data.


链接:

http://theorangeduck.com/page/neural-network-not-working


原文链接:

https://m.weibo.cn/1402400261/4148308836669338

“完整内容”请点击【阅读原文】
↓↓↓
登录查看更多
5

相关内容

Networking:IFIP International Conferences on Networking。 Explanation:国际网络会议。 Publisher:IFIP。 SIT: http://dblp.uni-trier.de/db/conf/networking/index.html
专知会员服务
60+阅读 · 2020年3月19日
深度强化学习策略梯度教程,53页ppt
专知会员服务
178+阅读 · 2020年2月1日
开源书:PyTorch深度学习起步
专知会员服务
50+阅读 · 2019年10月11日
机器学习入门的经验与建议
专知会员服务
92+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
103+阅读 · 2019年10月9日
【推荐】直接未来预测:增强学习监督学习
机器学习研究会
6+阅读 · 2017年11月24日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】MXNet深度情感分析实战
机器学习研究会
16+阅读 · 2017年10月4日
【推荐】决策树/随机森林深入解析
机器学习研究会
5+阅读 · 2017年9月21日
【推荐】深度学习目标检测全面综述
机器学习研究会
21+阅读 · 2017年9月13日
【推荐】用Tensorflow理解LSTM
机器学习研究会
36+阅读 · 2017年9月11日
【推荐】RNN/LSTM时序预测
机器学习研究会
25+阅读 · 2017年9月8日
【推荐】GAN架构入门综述(资源汇总)
机器学习研究会
10+阅读 · 2017年9月3日
【推荐】TensorFlow手把手CNN实践指南
机器学习研究会
5+阅读 · 2017年8月17日
【推荐】(Keras)LSTM多元时序预测教程
机器学习研究会
24+阅读 · 2017年8月14日
Arxiv
4+阅读 · 2018年10月31日
Auto-Context R-CNN
Arxiv
4+阅读 · 2018年7月8日
The Matrix Calculus You Need For Deep Learning
Arxiv
12+阅读 · 2018年7月2日
Arxiv
19+阅读 · 2018年6月27日
Arxiv
9+阅读 · 2018年3月23日
VIP会员
相关VIP内容
专知会员服务
60+阅读 · 2020年3月19日
深度强化学习策略梯度教程,53页ppt
专知会员服务
178+阅读 · 2020年2月1日
开源书:PyTorch深度学习起步
专知会员服务
50+阅读 · 2019年10月11日
机器学习入门的经验与建议
专知会员服务
92+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
103+阅读 · 2019年10月9日
相关资讯
【推荐】直接未来预测:增强学习监督学习
机器学习研究会
6+阅读 · 2017年11月24日
【推荐】YOLO实时目标检测(6fps)
机器学习研究会
20+阅读 · 2017年11月5日
【推荐】MXNet深度情感分析实战
机器学习研究会
16+阅读 · 2017年10月4日
【推荐】决策树/随机森林深入解析
机器学习研究会
5+阅读 · 2017年9月21日
【推荐】深度学习目标检测全面综述
机器学习研究会
21+阅读 · 2017年9月13日
【推荐】用Tensorflow理解LSTM
机器学习研究会
36+阅读 · 2017年9月11日
【推荐】RNN/LSTM时序预测
机器学习研究会
25+阅读 · 2017年9月8日
【推荐】GAN架构入门综述(资源汇总)
机器学习研究会
10+阅读 · 2017年9月3日
【推荐】TensorFlow手把手CNN实践指南
机器学习研究会
5+阅读 · 2017年8月17日
【推荐】(Keras)LSTM多元时序预测教程
机器学习研究会
24+阅读 · 2017年8月14日
Top
微信扫码咨询专知VIP会员