Data augmentation is a key element of deep learning pipelines, as it informs the network during training about transformations of the input data that keep the label unchanged. Manually finding adequate augmentation methods and parameters for a given pipeline is however rapidly cumbersome. In particular, while intuition can guide this decision for images, the design and choice of augmentation policies remains unclear for more complex types of data, such as neuroscience signals. Besides, class-dependent augmentation strategies have been surprisingly unexplored in the literature, although it is quite intuitive: changing the color of a car image does not change the object class to be predicted, but doing the same to the picture of an orange does. This paper investigates gradient-based automatic data augmentation algorithms amenable to class-wise policies with exponentially larger search spaces. Motivated by supervised learning applications using EEG signals for which good augmentation policies are mostly unknown, we propose a new differentiable relaxation of the problem. In the class-agnostic setting, results show that our new relaxation leads to optimal performance with faster training than competing gradient-based methods, while also outperforming gradient-free methods in the class-wise setting. This work proposes also novel differentiable augmentation operations relevant for sleep stage classification.
翻译:数据增强是深层学习管道的一个关键要素,因为它在培训过程中向网络通报关于保持标签不变的输入数据转换情况,因此数据增强是深层学习管道的一个关键要素。 手工为某个管道找到适当的增强方法和参数,但速度非常繁琐。 特别是,虽然直觉可以指导图像的这一决定,但增强政策的设计和选择对于更复杂的数据类型,例如神经科学信号,仍然不清楚。 此外,在文献中,依赖阶级的增强战略令人惊讶地没有探索,尽管这是非常直观的:改变汽车图像的颜色不会改变要预测的对象类别,而是照着橙色的图画。 本文调查了基于梯度的自动数据增强算法,这些算法适用于具有惊人较大搜索空间的课堂政策。 借助使用良好增强政策大多不为人所知的EEG信号的监督下的学习应用,我们提出了这一问题的新的不同放松。 在阶级- 诺科环境下,结果显示,我们新的放松会以更快的训练而不是以不同的梯度为基础的方法导致最佳性表现, 但同时也超越了在课堂设置中表现的无梯度方法。