We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
翻译:我们帮助人们更好地了解由神经网络和RELU激活和特定结构代表的功能类别。我们利用混合整数优化技术、多面理论和热带几何学,为通用近似理论提供了数学平衡,表明单层隐藏足以学习任何功能。特别是,我们调查完全可代表功能的类别是否通过添加更多层(不限制大小)而严格增加。作为我们调查的副产品,我们用肯定的方法对Wang和Sun(2005年)的单向线性功能进行了老的推测。我们还对在对数深度上代表功能的神经网络的规模提出了上限。</s>