Machine Reading Comprehension (MRC) is an active field in natural language processing with many successful developed models in recent years. Despite their high in-distribution accuracy, these models suffer from two issues: high training cost and low out-of-distribution accuracy. Even though some approaches have been presented to tackle the generalization problem, they have high, intolerable training costs. In this paper, we investigate the effect of ensemble learning approach to improve generalization of MRC systems without retraining a big model. After separately training the base models with different structures on different datasets, they are ensembled using weighting and stacking approaches in probabilistic and non-probabilistic settings. Three configurations are investigated including heterogeneous, homogeneous, and hybrid on eight datasets and six state-of-the-art models. We identify the important factors in the effectiveness of ensemble methods. Also, we compare the robustness of ensemble and fine-tuned models against data distribution shifts. The experimental results show the effectiveness and robustness of the ensemble approach in improving the out-of-distribution accuracy of MRC systems, especially when the base models are similar in accuracies.
翻译:机器阅读理解(MRC)是自然语言处理的一个积极领域,近年来有许多成功的开发模型。这些模型尽管在分布方面准确性很高,但有两个问题:高培训成本和低分发准确性。尽管已经提出一些办法解决一般性问题,但培训成本却很高,无法容忍。在本文件中,我们调查了混合学习方法对改进MRC系统普遍化的影响,而没有大模型的再培训。在对基础模型进行不同结构在不同数据集上的不同结构分别培训之后,这些模型在概率和非概率环境下使用加权和堆叠方法混在一起。对三种配置进行了调查,包括8个数据集和6个最先进的模型的多元性、同质性和混合性。我们确定了共同方法有效性的重要因素。我们还比较了组合和微调模型与数据分布变化的稳健性。实验结果显示,在改进MRCura系统外分配准确性方面,特别是在基本模型相似的情况下,组合方法的有效性和稳健性。