转自:爱可可-爱生活
So you're developing the next great breakthrough in deep learning but you've hit an unfortunate setback: your neural network isn't working and you have no idea what to do. You go to your boss/supervisor but they don't know either - they are just as new to all of this as you - so what now?
Well luckily for you I'm here with a list of all the things you've probably done wrong and compiled from my own experiences implementing neural networks and supervising other students with their projects:
You Forgot to Normalize Your Data
You Forgot to Check your Results
You Forgot to Preprocess Your Data
You Forgot to use any Regularization
You Used a too Large Batch Size
You Used an Incorrect Learning Rate
You Used the Wrong Activation Function on the Final Layer
Your Network contains Bad Gradients
You Initialized your Network Weights Incorrectly
You Used a Network that was too Deep
You Used the Wrong Number of Hidden Units
When using neural networks it is essential to think exactly how you are going to normalize your data. This is a non-negotiable step - there is very little chance of your network working at all without doing this correctly and with some care. Since this step is so essential and so well known in the deep learning community it is very rarely mentioned in papers and so almost always trips up beginners.
In general normalization means this - subtract the mean from your data and divide your data by it's standard deviation. Usually this is done individually for each input and output feature but you may often want to do it for groups of features or to treat the normalization of some features with special care.
The primary reason we need to normalize our data is that most parts of a neural network pipeline assume that both the input and output data are distributed with a standard deviation of around one and a mean of roughly zero. These assumptions appear everywhere in deep learning literature, from weight initialization, to activation functions, to the optimization algorithms which train the network.
An untrained neural network will typically output values in roughly in the range -1 to 1. If you are expecting it to output values in some other range, (for example RGB images which are stored as bytes are in the range 0 to 255) you are going to have some problems. When starting training the network will be hugely unstable as it will be producing values of -1 or 1 when values like 255 are expected - an error which is considered huge by most optimization algorithms used to train neural networks. This will produce huge gradients and likely your training error will explode. If somehow your training does not explode then the first few stages of the training will still be a waste as the first thing the network will learn is to scale and shift the output values into roughly the desired range. If you normalize your data (in this case you could simply divide by 128 and subtract 1) then none of this will be an issue.
In general, the scale of features in the neural network will also govern their importance. If you have a feature in the output with a large scale then it will generate a larger error compared to other features. Similarly, large scale features in the input will dominate the network and cause larger changes downstream. For this reason it isn't always enough to use the automatic normalization of many neural network libraries which blindly subtract the mean and divide by the standard deviation on a per-feature basis. You may have an input feature which typically ranges between 0.0 and 0.001 - is the range of this feature so small because it is an unimportant feature (in which case perhaps you don't want to re-scale it), or because it has some small unit in comparison to other features (in which case you do)? Similarly, be careful with features that have such a small range that their standard deviation becomes close to, or exactly, zero - these will produce instabilities of NaNs if you normalize them. It is important to think carefully about these issues - think about what each of your features really represent and consider normalization as the process of making the "units" of all the input features equal. This is one of the few aspects of Deep Learning where I believe a human is really required in the loop.
You've trained your network for a few epochs and you can see the error going down - success! Does this mean you've done it? PhD awarded? Unfortunately not - it is almost certain there is still something wrong with your code. It could be a bug in the data pre-processing, the training code, or even the inference. just because the error goes down doesn't mean your network is learning anything useful.
Checking your data looks correct at each stage of the pipeline is incredibly important. Usually this means finding some way to visualize the results. If you have image data then it is easy - animation data can also be visualized without too much trouble. If you have something more exotic you must find a way to sanity check it to make sure it looks correct at each stage of your pre-processing, training, and inference pipeline and compare it to ground truth data.
链接:
http://theorangeduck.com/page/neural-network-not-working
原文链接:
https://m.weibo.cn/1402400261/4148308836669338