In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets.
翻译:在许多机器学习任务中,对模型进行了培训,以预测图表等结构数据。例如,在自然语言处理中,将文字分析成依赖性树或抽象含义图示(AMR)非常常见。另一方面,混合方法将多种模型的预测结合起来,创造出比个别预测更可靠和准确的新模型。在文献中,为分类或回归问题提出了许多组合技术,但是,没有彻底研究共同图形预测。在这项工作中,我们将这一问题正式确定为挖掘最大的图表,而最大的图表是图表预测集。由于问题在于NP-Hard,我们建议一种高效的超值算法,以近似最佳解决方案。为了验证我们的方法,我们在AMR分类中进行了实验。实验结果表明,拟议的方法可以结合最先进的AMR定量数据的强度,以创造比五个标准基准数据集中的任何单个模型都更准确的新预测。