转自:ArnetMiner
The network had been training for the last 12 hours. It all looked good: the gradients were flowing and the loss was decreasing. But then came the predictions: all zeroes, all background, nothing detected. “What did I do wrong?” — I asked my computer, who didn’t answer.
Where do you start checking if your model is outputting garbage (for example predicting the mean of all outputs, or it has really poor accuracy)?
A network might not be training for a number of reasons. Over the course of many debugging sessions, I would often find myself doing the same checks. I’ve compiled my experience along with the best ideas around in this handy list. I hope they would be of use to you, too.
0. How to use this guide?
I. Dataset issues
II. Data Normalization/Augmentation issues
III. Implementation issues
IV. Training issues
A lot of things can go wrong. But some of them are more likely to be broken than others. I usually start with this short list as an emergency first response:
Start with a simple model that is known to work for this type of data (for example, VGG for images). Use a standard loss if possible.
Turn off all bells and whistles, e.g. regularization and data augmentation.
If finetuning a model, double check the preprocessing, for it should be the same as the original model’s training.
Verify that the input data is correct.
Start with a really small dataset (2–20 samples). Overfit on it and gradually add more data.
Start gradually adding back all the pieces that were omitted: augmentation/regularization, custom loss functions, try more complex models.
If the steps above don’t do it, start going down the following big list and verify things one by one.
链接(需翻墙):
https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607
原文链接:
https://m.weibo.cn/1870858943/4139636177186495