Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep sign neural networks, deep ReLU neural networks, and deep networks that are infinitely wide but have a bounded sum of weights. For sign neural networks, we recover sample-complexity bounds that follow from VC-dimension based arguments. For the latter two neural network environments, we establish new results that suggest that the sample complexity of learning under these data generating processes is at most linear and quadratic, respectively, in network depth.
翻译:每年,深层次的学习展示出与更深、更广的神经网络有关的新的、改进的经验结果。与此同时,利用现有的理论框架,很难分析深于两层的网络,而不必使用计算参数,或遇到深度指数指数化的抽样复杂界限。也许尝试在不同的镜头下分析现代机器的学习或许是有成效的。在本文中,我们提出一个新的信息理论框架,它有自己的遗憾感和样本复杂性概念来分析机器学习的数据要求。在我们的框架下,我们首先通过一些典型的例子开展工作,例如标度估计和线性回归来建立直觉和引入一般技术。然后,我们利用这个框架来研究从深层信号神经网络、深层ReLU神经网络和深层网络产生的数据中学习的样本复杂性。对于标志神经网络来说,我们从基于VC-dimenion的参数的参数中恢复了样性复杂界限。对于后两个神经网络环境,我们建立了新的结果,表明这些数据生成过程的抽样复杂性在网络深度中分别是最直线性和二次的。