Language features are ever-evolving in the real-world social media environment. Many trained models in natural language understanding (NLU), ineffective in semantic inference for unseen features, might consequently struggle with the deteriorating performance in dynamicity. To address this challenge, we empirically study social media NLU in a dynamic setup, where models are trained on the past data and test on the future. It better reflects the realistic practice compared to the commonly-adopted static setup of random data split. To further analyze model adaption to the dynamicity, we explore the usefulness of leveraging some unlabeled data created after a model is trained. The performance of unsupervised domain adaption baselines based on auto-encoding and pseudo-labeling and a joint framework coupling them both are examined in the experiments. Substantial results on four social media tasks imply the universally negative effects of evolving environments over classification accuracy, while auto-encoding and pseudo-labeling collaboratively show the best robustness in dynamicity.
翻译:语言特征在现实世界的社交媒体环境中不断演变。许多自然语言理解(NLU)的经过培训的模型(NLU)在对看不见特征的语义推论中无效,因此可能会在动态性能的恶化中挣扎。为了应对这一挑战,我们在动态的设置中实验性地研究社交媒体NLU, 在这种设置中,对模型进行过去的数据培训,并对未来进行测试。它更好地反映现实做法,与通常采用的随机数据随机分割静态设置相比。为了进一步分析模型适应动态性,我们探讨了利用在模型培训后产生的一些未加标签的数据的效用。在自动编码和假标签的基础上,在不受监督的域调整基线以及将两者结合的联合框架的性能在实验中得到了研究。四个社会媒体任务的实质性结果意味着不断变化的环境对分类准确性的普遍负面影响,而自动编码和假标签协作性展示了动态性的最佳强度。