Datasets often contain input dimensions that are unnecessary to predict the output label, e.g. background in object recognition, which lead to more trainable parameters. Deep Neural Networks (DNNs) are robust to increasing the number of parameters in the hidden layers, but it is unclear whether this holds true for the input layer. In this letter, we investigate the impact of unnecessary input dimensions on a central issue of DNNs: their data efficiency, ie. the amount of examples needed to achieve certain generalization performance. Our results show that unnecessary input dimensions that are task-unrelated substantially degrade data efficiency. This highlights the need for mechanisms that remove {task-unrelated} dimensions to enable data efficiency gains.
翻译:数据集通常含有对预测输出标签不必要的输入维度, 例如对象识别的背景, 从而导致更多的可培训参数。 深神经网络( DNN) 强于增加隐藏层的参数数量, 但对于输入层来说这是否正确尚不清楚 。 在本信中, 我们调查不必要的输入维度对 DNN的中心问题的影响: 数据效率, 即实现某些简单化性能所需的示例数量 。 我们的结果显示, 不必要的输入维度与任务无关, 大大降低了数据效率 。 这凸显了消除 {task- unconteld} 维度的机制的必要性, 以便实现数据效率增益 。