Across applications spanning supervised classification and sequential control, deep learning has been reported to find "shortcut" solutions that fail catastrophically under minor changes in the data distribution. In this paper, we show empirically that DNNs can be coaxed to avoid poor shortcuts by providing an additional "priming" feature computed from key input features, usually a coarse output estimate. Priming relies on approximate domain knowledge of these task-relevant key input features, which is often easy to obtain in practical settings. For example, one might prioritize recent frames over past frames in a video input for visual imitation learning, or salient foreground over background pixels for image classification. On NICO image classification, MuJoCo continuous control, and CARLA autonomous driving, our priming strategy works significantly better than several popular state-of-the-art approaches for feature selection and data augmentation. We connect these empirical findings to recent theoretical results on DNN optimization, and argue theoretically that priming distracts the optimizer away from poor shortcuts by creating better, simpler shortcuts.
翻译:在监督分类和顺序控制的各个应用中,据报告深层次学习会发现“简短”的解决方案,在数据分布的微小变化下灾难性地失败。在本文中,我们从经验上表明,DNN可以通过提供从关键输入特征(通常是粗略的输出估计)计算出来的附加“最优”功能,来避免差的捷径。原始化依赖于这些任务相关关键输入特征的近似域知识,这在实际环境中通常很容易获得。例如,人们可能会在视频输入中将最近的框架置于过去框架之上,以便进行视觉模仿学习,或者在图像分类的背景像素的突出前台。在NICO图像分类、 MuJoCo 连续控制以及 CARLA 自主驱动方面,我们的连接战略比几个流行的功能选择和数据增强的“最高级”方法要好得多。我们将这些经验发现与最近关于 DNN优化的理论结果联系起来,并且从理论上说,通过创建更好、更简单的捷径来吸引最优化的人。