Dropout就是指在深度学习网络的训练过程中,对于神经网络单元,按照一定的概率将其暂时从网络中丢弃。Dropout可以减轻过拟合问题。

VIP内容

Dropout是一种广泛使用的正则化技术,通常需要为许多体系结构获得最先进的技术。这项工作表明,dropout引入了两种截然不同但相互纠缠的正则化效应:由于dropout修改了预期的训练目标而产生的显式效应(在之前的工作中也研究过),以及可能令人惊讶的是,dropout训练更新中的随机性带来的另一种隐式效应。这种隐式正则化效应类似于小批量随机梯度下降中的随机度效应。我们通过控制实验把这两种效应分开。然后,我们推导出分析的简化,用模型的导数和损失来描述每个影响,对于深度神经网络。我们证明了这些简化的、解析的正则化器准确地捕获了辍学的重要方面,表明它们在实践中忠实地替代了dropout。

成为VIP会员查看完整内容
0
23

最新论文

The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations provide insights into the phenomenon, their assumptions are not always satisfied in practice. In particular, the first approach requires the network to have one layer with order of $N$ neurons ($N$ being the number of training samples), while the second one requires the loss to be almost invariant after removing half of the neurons at each layer (up to some rescaling of the remaining ones). In this work, we improve both conditions by exploiting the quality of the features at every intermediate layer together with a milder over-parameterization condition. More specifically, we show that: (i) under generic assumptions on the features of intermediate layers, it suffices that the last two hidden layers have order of $\sqrt{N}$ neurons, and (ii) if subsets of features at each layer are linearly separable, then no over-parameterization is needed to show the connectivity. Our experiments confirm that the proposed condition ensures the connectivity of solutions found by stochastic gradient descent, even in settings where the previous requirements do not hold.

0
0
下载
预览
父主题
Top