An important characteristic of neural networks is their ability to learn representations of the input data with effective features for prediction, which is believed to be a key factor to their superior empirical performance. To better understand the source and benefit of feature learning in neural networks, we consider learning problems motivated by practical data, where the labels are determined by a set of class relevant patterns and the inputs are generated from these along with some background patterns. We prove that neural networks trained by gradient descent can succeed on these problems. The success relies on the emergence and improvement of effective features, which are learned among exponentially many candidates efficiently by exploiting the data (in particular, the structure of the input distribution). In contrast, no linear models on data-independent features of polynomial sizes can learn to as good errors. Furthermore, if the specific input structure is removed, then no polynomial algorithm in the Statistical Query model can learn even weakly. These results provide theoretical evidence showing that feature learning in neural networks depends strongly on the input structure and leads to the superior performance. Our preliminary experimental results on synthetic and real data also provide positive support.
翻译:神经网络的一个重要特征是,它们能够学习具有有效预测特征的投入数据的表述,这被认为是其高级实证性能的一个关键因素。为了更好地了解神经网络中特征学习的来源和好处,我们考虑了由实际数据驱动的学习问题,因为标签是由一系列与阶级相关的模式决定的,而输入来自这些模式和一些背景模式。我们证明,通过梯度下降而受过训练的神经网络能够在这些问题上取得成功。成功取决于有效特征的出现和改进,这些特征通过利用数据(特别是输入分布结构)在众多候选人中有效学习。相比之下,关于多米尺寸数据独立特征的线性模型不能作为良好的错误学习。此外,如果删除具体的输入结构,那么统计查询模型中的任何多元算法都无法学到甚至微弱的东西。这些结果提供了理论证据,表明神经网络中特征学习在很大程度上取决于输入结构并导致优异的性表现。我们关于合成和真实数据的初步实验结果也提供了积极的支持。