Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as `Universal Language Models' as the starting point across diverse tasks, domains, and languages. This work explores the notion of `Universality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current theoretical and empirical results that support model performance across these dimensions, along with extensions that may help address some of their current limitations. Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual language models and help discern research gaps and directions for future work to make these LMs inclusive and fair to diverse applications, users, and linguistic phenomena.
翻译:此外,XLM-R和MBERT等多语种模式的多语种版本在零球跨语言转让方面产生了有希望的结果,有可能使NLP应用能够用于许多服务不足和资源不足的语言。由于这一初步成功,预先培训的模型被用作“通用语言模型”,作为不同任务、领域和语言的起点。通过这项调查,我们为理解大规模背景语言模型的能力和局限性奠定了基础,帮助辨别研究差距和今后各种语言现象的包容性应用,使这些多样性用户了解这些差异和趋势。