建设明天的明天:评估文本分类者在时间上的持久性 (Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers)

Performance of text classification models tends to drop over time due to changes in data, which limits their usage over time. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. We find that one can estimate how a model will retain its performance over time based on (i) how well the model performs over a restricted time period and its extrapolation to a longer time period, and (ii) the linguistic characteristics of the dataset, such as the familiarity score between subsets from different years. Findings from these experiments have important implications for the design of text classification models with the aim of preserving performance over time.

翻译：文本分类模型的性能往往会随着时间而下降,因为数据的变化限制了它们的使用。因此,预测模型长期存在的能力能够有助于设计能够在较长时期内有效使用的模型。在本文件中,我们从实际角度来看待这一问题,方法是评估范围广泛的语言模型和分类算法在一段时间内持续的能力,以及数据集特性如何有助于预测不同模型的时间稳定性。我们对涵盖6至19年、涉及不同任务和数据类型的三个数据集进行纵向分类试验。我们发现,根据(一) 模型在一定时期内的运行情况及其外推到更长时期的情况,以及(二) 数据集的语言特点,例如不同年份子群之间的熟悉度分数。这些试验的结果对文本分类模型的设计有着重要影响,目的是保持一段时间的性能。