Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time.
翻译:由于数据的变化,文本分类模型的性能往往会随着时间的推移而下降,因为数据的变化限制了预先培训模型的寿命。因此,预测模型是否有能力长期持续下去,有助于设计能够在较长时期内有效使用的模型。在本文件中,我们透彻地讨论问题,为任务建立一个评价机制。我们从实际角度来研究这一问题,评估各种语言模型和分类算法在一段时间内持续下去的能力,以及数据集特性如何有助于预测不同模型的时间稳定性。我们对三个数据集进行了纵向分类实验,涵盖6至19年,涉及不同任务和数据类型。通过将纵向数据集分为若干年,我们进行全面的实验,对过去和将来不同年份的不同数据进行培训和测试。这样可以逐步调查培训和测试组之间的时间差距对分类性能的影响,并衡量长期持续的程度。