While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.
翻译:虽然人们知道辍学是一种成功的正规化技术,但对导致这一成功的机制仍缺乏洞察力。我们引入了\ emph{ 重量扩张} 的概念,即由重量共变矩阵的柱体或行矢量组成的平行体的签名体积增加,并表明体重扩张是在PAC-Bayyesian环境下增加普遍化的有效手段。我们提供了一个理论论据,认为辍学会导致体重扩张,并导致大量经验支持辍学与体重扩张之间的相互关系。为了支持我们的假设,即体重扩张可被视为因辍学而增强的普及能力的一种 emph{ 指标},而不仅仅是一种副产品,我们研究了实现体重扩张(resp.\ 收缩)的其他方法,并发现这些方法通常导致普遍化能力的增加(resp.\ 下降)。这表明,辍学是一种有吸引力的常规化,因为它是获得体重扩张的一种计算上便宜的方法。这种洞察力说明了辍学作为正规化者的作用,同时为确定通过重量扩张有可能改进普遍化的正规化者铺平了道路。