We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function. We assume that the parameters of the neural network are chosen uniformly at random with respect to various probability distributions, and compute the expected distribution of the points of non-linearity. We use these results to explain why the network may be biased towards outputting functions with simpler geometry, and why certain functions with low information-theoretic complexity are nonetheless hard for a neural network to approximate.
翻译:我们考虑从实数到实数的函数,这些函数由具有一个隐藏激活层、任意宽度和ReLU激活函数的神经网络输出。我们假设神经网络的参数是根据各种概率分布均匀随机选择的,并计算非线性点的预期分布。我们利用这些结果来解释为什么网络可能倾向于输出几何形状更简单的函数,以及为什么某些信息熵复杂度较低的函数对神经网络的逼近仍然很难。