We investigate the effectiveness of ensembles of pretrained transformer-based language models on short answer questions using the Kaggle Automated Short Answer Scoring dataset. We fine-tune a collection of popular small, base, and large pretrained transformer-based language models, and train one feature-base model on the dataset with the aim of testing ensembles of these models. We used an early stopping mechanism and hyperparameter optimization in training. We observe that generally that the larger models perform slightly better, however, they still fall short of state-of-the-art results one their own. Once we consider ensembles of models, there are ensembles of a number of large networks that do produce state-of-the-art results, however, these ensembles are too large to realistically be put in a production environment.
翻译:我们使用Kaggle自动快速解答数据集来调查基于变压器的短期问题语言模型集集的效果。 我们微调了一批广受欢迎的小型、基地和大型以变压器为基础的变压器语言模型集,并在数据集上培训了一个地基模型,以测试这些模型的组合。 我们在培训中使用了早期停止机制和超光谱优化。 我们观察到,总体而言,大型模型的表现稍好一些,但是,它们还远远达不到它们自己的最先进的结果。 一旦我们考虑模型组装,就会发现一系列大型网络的集合,能够产生最先进的结果,然而,这些集合体太大了,实际上无法放在生产环境中。