Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational invariances to the model (e.g., invariance to photometric variations) and helps improve accuracy. Compared to image data, the appearance variations in videos are far more complex due to the additional temporal dimension. Yet, data augmentation methods for videos remain under-explored. This paper investigates various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations. When integrated with existing semi-supervised learning frameworks, we show that our data augmentation strategy leads to promising performance on the Kinetics-100/400, Mini-Something-v2, UCF-101, and HMDB-51 datasets in the low-label regime. We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
翻译:在标签数据稀少的情况下,数据增强是一种普遍改进图像分类的技术。 模型预测要求模型预测对多种数据增强具有变异性, 有效地向模型注入理想的表达变量( 例如, 与光度变化不一), 并有助于提高准确性。 与图像数据相比, 视频的外观变异由于额外的时间维度而要复杂得多。 然而, 视频的数据增强方法仍然在探索中。 本文调查了各种数据增强战略, 这些战略捕捉了不同的视频变异, 包括光度、 几何、 时间和 演员/ 屏幕增强。 当与现有的半超强学习框架相结合时, 我们展示了我们的数据增强战略导致在低标签制度中的动因- 100/ 400、 微型- 负载- v2、 UCF- 101 和 HMDB-51 数据集上取得良好的表现。 我们还验证了我们在充分监督的环境中的数据增强战略, 并展示了更好的性能。