DELTA:无退化、完全试验时间适应 (DELTA: degradation-free fully test-time adaptation)

Fully test-time adaptation aims at adapting a pre-trained model to the test stream during real-time inference, which is urgently required when the test distribution differs from the training distribution. Several efforts have been devoted to improving adaptation performance. However, we find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning. First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates. Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes. In addition to the extensively studied test stream with independent and class-balanced samples, we further observe that the defects can be exacerbated in more complicated test environments, such as (time) dependent or class-imbalanced data. We observe that previous approaches work well in certain scenarios while show performance degradation in others due to their faults. In this paper, we provide a plug-in solution called DELTA for Degradation-freE fuLly Test-time Adaptation, which consists of two components: (i) Test-time Batch Renormalization (TBR), introduced to improve the estimated normalization statistics. (ii) Dynamic Online re-weighTing (DOT), designed to address the class bias within optimization. We investigate various test-time adaptation methods on three commonly used datasets with four scenarios, and a newly introduced real-world dataset. DELTA can help them deal with all scenarios simultaneously, leading to SOTA performance.

翻译：完全测试-时间适应的目的是在实时推断期间使经过预先培训的模型适应测试流,这是在测试分布与培训分布不同时迫切需要的。已作出若干努力,致力于改进适应性绩效。然而,我们发现,在普遍的适应方法中,诸如测试-时间批次正常化(BN)和自学等普遍存在的适应方法中,隐藏了两个不受欢迎的缺陷。首先,我们揭示测试-测试时间BN的正常化统计数据完全受到目前收到的测试样品的影响,从而得出不准确的估计。第二,我们表明在测试-时间适应期间,参数更新偏向于某些占支配地位的类别。除了通过独立和班级平衡抽样广泛研究的测试流之外,我们还观察到,在比较复杂的测试环境中,例如(时间)依赖或类平衡数据等,这些缺陷可能会加剧。我们观察到,以前的方法在某些情况下效果良好,同时显示其他测试样品的性能退化-易变性测试-时间适应性能,其中包括两个组成部分:(一) 测试-测试-时间跨级标准-升级-升级-升级-升级-升级-升级-升级-最新数据(BRIT),改进用于新设计的Siral-Siral-时间测试-Sirvial-rod-rol-rod-rod-rod-trad-rod-rod-trad-trad-trad-tod-stal-tradisl-trad-trad-trad-todal-stal-trad-tradisl-tradisl-todisl-todisl-todisl-tod-todisl-tod-tod-tod-todal-tod-to-to-tod-tod-todal-todal-tod-todal-todal-todisl-sl-sal-sal-sl-sl-todal-todal-todal-todal-todal-todal-todal-sl-Idisal-I-Idal-I-I-I-I-Idisl-sl-todisl-I-I-I-I-I-I-I-I-I-I-I-I-I-