We observe that given two (compatible) classes of functions $\mathcal{F}$ and $\mathcal{H}$ with small capacity as measured by their uniform covering numbers, the capacity of the composition class $\mathcal{H} \circ \mathcal{F}$ can become prohibitively large or even unbounded. We then show that adding a small amount of Gaussian noise to the output of $\mathcal{F}$ before composing it with $\mathcal{H}$ can effectively control the capacity of $\mathcal{H} \circ \mathcal{F}$, offering a general recipe for modular design. To prove our results, we define new notions of uniform covering number of random functions with respect to the total variation and Wasserstein distances. We instantiate our results for the case of multi-layer sigmoid neural networks. Preliminary empirical results on MNIST dataset indicate that the amount of noise required to improve over existing uniform bounds can be numerically negligible (i.e., element-wise i.i.d. Gaussian noise with standard deviation $10^{-240}$). The source codes are available at https://github.com/fathollahpour/composition_noise.
翻译:我们观察到,如果给两个(相容)类别的功能$(mathcal{F}$)和美元(mathcal{H}),其容量小,以其统一的覆盖数量来衡量,那么组成类$\mathcal{H}\circ\mathcal{F}$(curc\cal{F}$)的能力可能会变得令人望而却步,甚至没有限制。我们然后显示,在以 $\mathcal{H} 和 $(mathccal{H}) 将少量高斯噪音添加到$\mathcal{F} 的输出中,就能有效控制$(mathcal{H}\cic\crcal\mathcal{F}$) 的能力,为模块设计提供一个通用的配方。为了证明我们的结果,我们定义了包含总变异性和瓦瑟斯坦距离随机函数数的新的统一概念。我们为多层类类神经网络的输出,在MINISTIS数据库上的初步经验结果显示,改进现有统一界限所需的噪音数量可以以数字表示(i.e.e.e-flexn_godroformismusobisorisormus)。