We contribute to a better understanding of the class of functions that is represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning tasks. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). This problem has potential impact on algorithmic and statistical aspects because of the insight it provides into the class of functions represented by neural hypothesis classes. However, to the best of our knowledge, this question has not been investigated in the neural network literature. We also present upper bounds on the sizes of neural networks required to represent functions in these neural hypothesis classes.
翻译:我们帮助人们更好地了解由神经网络和RELU激活和特定结构所代表功能类别。我们利用混合整数优化、多元理论和热带几何学等技术,为通用近似理论提供了数学平衡,表明单一隐藏的层足以完成学习任务。特别是,我们调查完全可代表功能类别是否通过增加更多的层(不限制其大小)而严格增加。这个问题对算法和统计方面有潜在影响,因为它对神经假说类所代表功能类别提供了洞察力。然而,根据我们所知,这一问题并未在神经网络文献中调查。我们还对这些神经网络中代表这些神经假说类功能所需的神经网络规模提出了上层界限。