The ever-growing size of the foundation language model has brought significant performance gains in various types of downstream tasks. With the existence of side-effects brought about by the large size of the foundation language model such as deployment cost, availability issues, and environmental cost, there is some interest in exploring other possible directions, such as a divide-and-conquer scheme. In this paper, we are asking a basic question: are language processes naturally dividable? We study this problem with a simple two-tower language model setting, where two language models with identical configurations are trained side-by-side cooperatively. With this setting, we discover the spontaneous emerging preference phenomenon, where some of the tokens are consistently better predicted by one tower while others by another tower. This phenomenon is qualitatively stable, regardless of model configuration and type, suggesting this as an intrinsic property of natural language. This study suggests that interesting properties of natural language are still waiting to be discovered, which may aid the future development of natural language processing techniques.
翻译:基础语言模式规模不断扩大,在各类下游任务中取得了显著的绩效成果。由于基础语言模式规模庞大,如部署成本、可用性问题和环境成本等,产生了副作用,因此人们有兴趣探索其他可能的方向,如分而治之办法。在本文中,我们提出一个基本问题:语言过程是否自然易变?我们用一个简单的双塔语言模式来研究这一问题,即两种配置相同的语言模式是经过合作训练的。在这个背景下,我们发现了自发出现的偏好现象,即一些象征物始终由一座塔和另一座塔预测更好。这个现象在质量上是稳定的,无论模式配置和类型如何,都表明这是自然语言的固有属性。这一研究表明,自然语言的有趣特性仍有待发现,这可能有助于自然语言处理技术的未来发展。