Data augmentation is a critical contributing factor to the success of deep learning but heavily relies on prior domain knowledge which is not always available. Recent works on automatic data augmentation learn a policy to form a sequence of augmentation operations, which are still pre-defined and restricted to limited options. In this paper, we show that a prior-free autonomous data augmentation's objective can be derived from a representation learning principle that aims to preserve the minimum sufficient information of the labels. Given an example, the objective aims at creating a distant "hard positive example" as the augmentation, while still preserving the original label. We then propose a practical surrogate to the objective that can be optimized efficiently and integrated seamlessly into existing methods for a broad class of machine learning tasks, e.g., supervised, semi-supervised, and noisy-label learning. Unlike previous works, our method does not require training an extra generative model but instead leverages the intermediate layer representations of the end-task model for generating data augmentations. In experiments, we show that our method consistently brings non-trivial improvements to the three aforementioned learning tasks from both efficiency and final performance, either or not combined with strong pre-defined augmentations, e.g., on medical images when domain knowledge is unavailable and the existing augmentation techniques perform poorly. Code is available at: https://github.com/kai-wen-yang/LPA3}{https://github.com/kai-wen-yang/LPA3.
翻译:数据增强是深层学习成功的一个关键因素,但在很大程度上依赖并非始终具备的先前领域知识。最近关于数据增强的自动化工作学习了形成一系列增强行动的政策,而这种政策仍然是预先确定的,而且仅限于有限的选择。在本文件中,我们表明,一个事先免费的自主数据增强目标可以来自一种代表学习原则,该原则的目的是保存标签中最起码足够的信息。举例来说,目标是建立一个遥远的“硬正面实例”,作为增强,同时仍然保留原有标签。然后我们提出一个实用的替代工具,以达到以下三个目标:既能优化效率又能无缝地融入现有的一系列机器学习任务,例如,监督、半监督、杂乱标签学习。与以往的工作不同,我们的方法并不需要培训一个额外的发酵模型,而是利用最终塔克模式的中间层展示来生成数据增强。在实验中,我们展示的方法始终能给上述三项学习任务带来非三方面的改进,既包括效率,也包括最后的绩效,或者是无缝的,要么是监控的、半监督、半监督的,还有杂的标签学习方法。 我们的方法并不需要培训一个超强的版本的版本的版本/LGA/com。 在现有的域域级/Q/Q-rub-rub-arb-arb-ar-ar-com