To advance deep learning methodologies in the next decade, a theoretical framework for reasoning about modern neural networks is needed. While efforts are increasing toward demystifying why deep learning is so effective, a comprehensive picture remains lacking, suggesting that a better theory is possible. We argue that a future deep learning theory should inherit three characteristics: a \textit{hierarchically} structured network architecture, parameters \textit{iteratively} optimized using stochastic gradient-based methods, and information from the data that evolves \textit{compressively}. As an instantiation, we integrate these characteristics into a graphical model called \textit{neurashed}. This model effectively explains some common empirical patterns in deep learning. In particular, neurashed enables insights into implicit regularization, information bottleneck, and local elasticity. Finally, we discuss how neurashed can guide the development of deep learning theories.
翻译:为了在未来十年内推进深层次的学习方法,需要为现代神经网络的理论推理建立一个理论框架。虽然人们越来越努力去解开深层次学习如此有效的原因,但全面的情况仍然缺乏,这表明更好的理论是可能的。我们主张,未来的深层次学习理论应该继承三个特征:结构化网络结构、使用随机梯度法优化的参数、以及从数据中得出的信息。作为一个即时现象,我们将这些特征纳入称为\ textit{norashed}的图形模型中。这一模型有效地解释了深层学习中的一些共同的经验模式。特别是,空洞能够洞察到隐含的正规化、信息瓶颈和本地弹性。最后,我们讨论无线能如何指导深层学习理论的发展。