Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.
翻译:使用噪音标签(LNL)学习是一个典型的问题,已经为图像任务进行了广泛研究,但对于文献中的视频来说则更少。 直接从图像迁移到视频而不考虑诸如计算成本和冗余信息等视频的特性,这不是一个明智的选择。 在本文中,我们提出了两个新的视频分析策略,使用噪音标签进行视频分析:(1) 轻量的频道选择方法,称为基于特征标签的噪音检测频道的频道排流。 这种方法选择了将每个类别中清洁和吵杂事件分开的最有歧视的渠道;(2) 一种被称为噪音反比学习的新型对比战略,它构建了清洁和吵闹事件之间的关系,以规范模式培训。 对三个众所周知的视频分类基准数据集的实验表明,我们提议的Tru_bf Ncat@b}Esplit-contr1 abfsbf T} (NEAT) 大大超越了现有基线。 通过将尺寸降低到10 ⁇ 的尺寸,我们的方法实现了超过0.4的噪音检测F1和5Zinality-Nasy-Nasyalnialal dalizalisal-alisal-alisalisalisalisalisalisalisalisal laisticalistical lax lax lax lax lax lax lax lax