In this paper, we study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems with continuous training data updates. For this study, we propose a methodology for the assessment of model stability (which we refer to as jitter under various experimental conditions. We find that model design choices, including network architecture and input representation, have a critical impact on stability through experiments on four text classification tasks and two sequence labeling tasks. In classification tasks, non-RNN-based models are observed to be more stable than RNN-based ones, while the encoder-decoder model is less stable in sequence labeling tasks. Moreover, input representations based on pre-trained fastText embeddings contribute to more stability than other choices. We also show that two learning strategies -- ensemble models and incremental training -- have a significant influence on stability. We recommend ML model designers account for trade-offs in accuracy and jitter when making modeling choices.
翻译:在本文中,我们研究机器学习(ML)模型的“稳定性”是在具有持续培训数据更新的大型、复杂的NLP系统背景下进行的。在这项研究中,我们提出了一种评估模型稳定性的方法(在各种实验条件下我们称之为“紧张”)。我们发现模型设计选择,包括网络结构和投入代表,通过四个文本分类任务和两个序列标签任务的实验,对稳定性有重大影响。在分类任务中,非基于RNN的模型被认为比基于RNN的模型更稳定,而编码器-解码器模型在序列标签任务方面不那么稳定。此外,基于预先训练的快速传输嵌入的输入表达方式有助于比其他选择更稳定。我们还表明,两种学习战略,即共同模型和渐进培训,对稳定性有重大影响。我们建议ML模型设计师在作出建模选择时对精度和精度进行权衡。