For many tasks, state-of-the-art results have been achieved with Transformer-based architectures, resulting in a paradigmatic shift in practices from the use of task-specific architectures to the fine-tuning of pre-trained language models. The ongoing trend consists in training models with an ever-increasing amount of data and parameters, which requires considerable resources. It leads to a strong search to improve resource efficiency based on algorithmic and hardware improvements evaluated only for English. This raises questions about their usability when applied to small-scale learning problems, for which a limited amount of training data is available, especially for under-resourced languages tasks. The lack of appropriately sized corpora is a hindrance to applying data-driven and transfer learning-based approaches with strong instability cases. In this paper, we establish a state-of-the-art of the efforts dedicated to the usability of Transformer-based models and propose to evaluate these improvements on the question-answering performances of French language which have few resources. We address the instability relating to data scarcity by investigating various training strategies with data augmentation, hyperparameters optimization and cross-lingual transfer. We also introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings.
翻译:对于许多任务而言,以变换器为基础的结构已经取得了最先进的成果,导致从使用任务特定结构到微调经过训练的语文模式的做法发生了范式转变。目前的趋势是培训模式,数据和参数数量不断增加,需要大量资源。这导致大力寻求根据仅对英文进行评价的算法和硬件改进来提高资源效率。这提出了在应用到小规模学习问题时是否可使用的问题,因为这方面的培训数据数量有限,特别是资源不足的语言任务。缺乏适当规模的组合阻碍了在不稳定的情况下采用以数据驱动和转让为基础的学习方法。我们在本文件中建立了专门致力于使用变换器模型的先进工作,并提议评价这种改进,因为法文的解答能力很少。我们通过研究各种培训战略,通过数据扩增、超参数优化和跨语言转移,解决与数据稀缺有关的不稳定问题。我们还采用了一种以数据增强、超基准优化和跨语言转移为主的低标准模式。我们还在法国的TRA中引入了一种具有竞争力的新型资源模型。