转自:爱可可-爱生活
In previous posts, I've discussed how we can train neural networks usingbackpropagation with gradient descent. One of the key hyperparameters to set in order to train a neural network is the learning rate for gradient descent. As a reminder, this parameter scales the magnitude of our weight updates in order to minimize the network's loss function.
If your learning rate is set too low, training will progress very slowly as you are making very tiny updates to the weights in your network. However, if your learning rate is set too high, it can cause undesirable divergent behavior in your loss function. I'll visualize these cases below - if you find these visuals hard to interpret, I'd recommend reading (at least) the first section in my post on gradient descent.
链接:
https://www.jeremyjordan.me/nn-learning-rate/
原文链接:
https://m.weibo.cn/1402400261/4213361128315325