To enable video models to be applied seamlessly across video tasks in different environments, various Video Unsupervised Domain Adaptation (VUDA) methods have been proposed to improve the robustness and transferability of video models. Despite improvements made in model robustness, these VUDA methods require access to both source data and source model parameters for adaptation, raising serious data privacy and model portability issues. To cope with the above concerns, this paper firstly formulates Black-box Video Domain Adaptation (BVDA) as a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor. While a few methods for Black-box Domain Adaptation (BDA) are proposed in image domain, these methods cannot apply to video domain since video modality has more complicated temporal features that are harder to align. To address BVDA, we propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations: endo-temporal regularization and exo-temporal regularization, performed across both clip and temporal features, while distilling knowledge from the predictions obtained from the black-box predictor. Empirical results demonstrate the state-of-the-art performance of EXTERN across various cross-domain closed-set and partial-set action recognition benchmarks, which even surpassed most existing video domain adaptation methods with source data accessibility.
翻译:为使视频模型在不同环境中的视频任务之间能够无缝地应用,已经提出了各种视频不受监督的域适应(VUDA)方法,以提高视频模型的稳健性和可转让性。尽管在模型稳健性方面有所改进,但这些VUDA方法要求获取源数据和源模型参数,以适应为目的,从而引起严重的数据隐私和模式可移植问题。为了应对上述关切,本文首先将黑盒视频域适应(BVDA)作为一种更现实、更具有挑战性的情景,即源视频模型仅作为黑盒预测器提供。虽然在图像域中提出了几种黑盒域适应(BDA)方法,但这些方法无法适用于视频域,因为视频模式具有更复杂的时间特征,更难以协调。为了应对 BVDA,我们建议采用新颖的 Endo 和 eX- Termoral Contracization 网络(EXTERN),为此采用掩码组合战略和视频连锁调校正的正规化:在黑盒和时空域域域内进行常规调校正规范化,同时展示从黑格和时间域域域域域域域域预测中获取的预测结果。