Borrowing ideas from {\em Production functions} in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models. We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset. One of the interesting conclusions of the study is that if the cost of machine translation is greater than zero, the optimal performance at least cost is always achieved with at least some or only manually-created data. To our knowledge, this is the first attempt towards extending the concept of production functions to study data collection strategies for training multilingual models, and can serve as a valuable tool for other similar cost vs data trade-offs in NLP.
翻译:在微观经济学中,我们从 ~ em Production 函数中借出想法,在本文中,我们引入了一个框架,系统评估机器翻译和人工制作的标记数据之间的性能和成本权衡,用于大规模多语种语言模型的具体任务微调;我们通过对TyDIQA-GoldP数据集进行案例研究来说明我们的框架的有效性。研究的一个有趣的结论是,如果机器翻译成本高于零,那么至少用一些或仅用手工制作的数据就能实现最佳绩效。 据我们所知,这是将生产功能的概念扩大到研究多语种模型的数据收集战略的首次尝试,并且可以作为类似成本相对于NLP数据交换的有价值的工具。