Assuming the source label space subsumes the target one, Partial Video Domain Adaptation (PVDA) is a more general and practical scenario for cross-domain video classification problems. The key challenge of PVDA is to mitigate the negative transfer caused by the source-only outlier classes. To tackle this challenge, a crucial step is to aggregate target predictions to assign class weights by up-weighing target classes and down-weighing outlier classes. However, the incorrect predictions of class weights can mislead the network and lead to negative transfer. Previous works improve the class weight accuracy by utilizing temporal features and attention mechanisms, but these methods may fall short when trying to generate accurate class weight when domain shifts are significant, as in most real-world scenarios. To deal with these challenges, we propose the Multi-modality Cluster-calibrated partial Adversarial Network (MCAN). MCAN enhances video feature extraction with multi-modal features from multiple temporal scales to form more robust overall features. It utilizes a novel class weight calibration method to alleviate the negative transfer caused by incorrect class weights. The calibration method tries to identify and weigh correct and incorrect predictions using distributional information implied by unsupervised clustering. Extensive experiments are conducted on prevailing PVDA benchmarks, and the proposed MCAN achieves significant improvements when compared to state-of-the-art PVDA methods.
翻译:假设源标签空间包含目标1, 部分视频域适应(PVDA)是跨域视频分类问题更一般、更实际的假设情景。 PVDA的主要挑战在于减轻仅来源的外部阶级造成的负转移。 要应对这一挑战,关键步骤是汇总目标预测,通过高比目标阶级和低比偏差阶级分配等级加权数。然而,对等级重量的不正确预测会误导网络,导致负转移。以前的工作利用时间特征和注意力机制提高等级重量的准确性,但当域变重要时,如在大多数现实世界情景中,这些方法可能会落后于生成准确的等级加权数。为了应对这些挑战,我们提议采用多调制组合部分倾斜网络(MCAN)来汇总目标预测,以从多个时间尺度和低比重偏差的偏差类加权数来分配等级加权数。 但是,使用新型的等级校准方法来减轻因不正确的类加权造成的负转移。校准方法试图在域变换显著的域变换时,在进行大规模变正和不准确的MDA模型分析时,在进行大规模变现时,通过大规模的变正和不精确的M-ADAAABA进行大规模的实验,通过进行重大的改进时,在进行大规模变压式的变换式的M-CAN的模型进行大规模的改进了对主的计算。