Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth. This prior distribution gives rise to high-dimensional latent representations that, with high probability, admit reasonably accurate low-dimensional approximations. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
翻译:每年,深度学习都会展示出更加新的和改进的经验结果,使用更深和更宽的神经网络。与此同时,我们暂时没有现成的理论框架可以在不采用参数计数或者遇到深度指数级别的样本复杂度上下限的情况下分析超过两层的网络。或许从不同的角度分析现代机器学习会更有意涵。在本文中,我们提出了一种全新的信息论框架,它有自己的遗憾和样本复杂度概念,可以分析机器学习的数据要求。在我们的框架下,我们首先通过一些经典示例,例如标量估计和线性回归,建立基础知识并介绍一般技巧。然后,我们使用该框架来研究由具有ReLU激活单元的深度神经网络生成的数据的学习样本复杂度。对于权重的特定先验分布,我们建立了样本复杂度约束,同时宽度独立并且线性深度,这一先验分布产生高维潜在表示,同时具有高概率下的相对精确的低维逼近。最后,我们通过随机单隐藏层神经网络的实验分析证实了我们的理论结果。