We introduce Temporal consistency for Test-time adaptation (TempT) a novel method for test-time adaptation on videos through the use of temporal coherence of predictions across sequential frames as a self-supervision signal. TempT is an approach with broad potential applications in computer vision tasks including facial expression recognition (FER) in videos. We evaluate TempT performance on the AffWild2 dataset. Our approach focuses solely on the unimodal visual aspect of the data and utilizes a popular 2D CNN backbone in contrast to larger sequential or attention-based models used in other approaches. Our preliminary experimental results demonstrate that TempT has competitive performance compared to the previous years reported performances and its efficacy provides a compelling proof-of-concept for its use in various real-world applications.
翻译:我们介绍了一种称为测试时间适应性的时间一致性(TempT)的新方法,它通过跨连续帧的预测的时间相干性作为自我监督信号,在视频上进行测试时间适应性。TempT是一种具有广泛应用潜力的计算机视觉任务方法,包括视频中的面部表情识别(FER)。我们在AffWild2数据集上评估了TempT的性能。与其他方法中使用的更大的序列或注意力模型不同,我们的方法仅关注数据的单模视觉方面,并利用了流行的2D CNN骨干。我们的初步实验结果证明,TempT与去年报告的表现相比具有竞争力,并且其有效性为在各种实际应用中使用提供了有力的概念证明。