Recent works have demonstrated that neural networks exhibit extreme simplicity bias(SB). That is, they learn only the simplest features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to the lack of a general and rigorous definition of features, these works showcase SB on semi-synthetic datasets such as Color-MNIST, MNIST-CIFAR where defining features is relatively easier. In this work, we rigorously define as well as thoroughly establish SB for one hidden layer neural networks. More concretely, (i) we define SB as the network essentially being a function of a low dimensional projection of the inputs (ii) theoretically, we show that when the data is linearly separable, the network primarily depends on only the linearly separable ($1$-dimensional) subspace even in the presence of an arbitrarily large number of other, more complex features which could have led to a significantly more robust classifier, (iii) empirically, we show that models trained on real datasets such as Imagenette and Waterbirds-Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets, iv) finally, we present a natural ensemble approach that encourages diversity in models by training successive models on features not used by earlier models, and demonstrate that it yields models that are significantly more robust to Gaussian noise.
翻译:最近的一些作品表明,神经网络表现出极端简单的偏差(SB ) 。 也就是说, 他们只学会了解决手头任务的最简单特征, 即使存在其他更稳健、更复杂的特征。 由于缺乏对特征的一般性和严格定义, 这些作品展示了半合成数据集的SB, 如Color-MNIST, MNIST-CIFAR等半合成数据集的SB, 其定义相对比较容易。 在这项工作中, 我们严格定义并彻底建立一个隐性层神经网络的SB。 更具体地说, (一) 我们定义SB 网络基本上是对投入进行低维度投影的功能。 (二) 从理论上讲,我们显示,当数据线性分离时,网络主要依赖于线性分离( $- 维) 亚空间, 即使存在任意大量的其他更复杂的特征, 其定义本可以导致一个更坚固的分类器, (三) 经验上, 我们显示, 我们所训练的模型是真实数据集, 如图像网和水鸟- Landbirks 模型, 我们确实依赖于一个低维的模型, 最终通过一个低维的模型, 展示了这些自然模型, 我们用的自然模型, 展示了这些模型, 的自然模型, 最终展示了这些模型, 以 的 以 以 的 的 的 水平的 的 的 以先前的 水平模型展示了这些 的 的 的 的 的 的 水平模型展示了 。