Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP due in part to the difficulty of identifying label-preserving transformations. In this paper, we present augmentation policies that yield significant accuracy improvements with language models. A key finding is that augmentation policy design -- for instance, the number of samples generated from a single, non-deterministic augmentation -- has a considerable impact on the benefit of TTA. Experiments across a binary classification task and dataset show that test-time augmentation can deliver consistent improvements over current state-of-the-art approaches.
翻译:测试时间增强 -- -- 测试投入转换实例的预测汇总 -- -- 是提高图像分类模型性能的既定技术。重要的是,TTA可以用来改进模型性能,无需额外培训,无需额外培训。尽管测试时间增强(TTA)可以适用于任何数据模式,但测试时间增强(TTA)在NLP的采用有限,部分原因是难以识别标签保护变异。在本文中,我们介绍了能够通过语言模型显著提高准确度的增强政策。一个关键发现是,增强政策设计 -- -- 例如,从单一的非非非决定性的增强中产生的样本数量 -- -- 对TTA的惠益有相当大的影响。 跨二进分类任务和数据集的实验表明,测试时间增强可以持续改进目前最先进的方法。