Several metrics have been proposed for assessing the similarity of (abstract) meaning representations (AMRs), but little is known about how they relate to human similarity ratings. Moreover, the current metrics have complementary strengths and weaknesses: some emphasize speed, while others make the alignment of graph structures explicit, at the price of a costly alignment step. In this work we propose new Weisfeiler-Leman AMR similarity metrics that unify the strengths of previous metrics, while mitigating their weaknesses. Specifically, our new metrics are able to match contextualized substructures and induce n:m alignments between their nodes. Furthermore, we introduce a Benchmark for AMR Metrics based on Overt Objectives (BAMBOO), the first benchmark to support empirical assessment of graph-based MR similarity metrics. BAMBOO maximizes the interpretability of results by defining multiple overt objectives that range from sentence similarity objectives to stress tests that probe a metric's robustness against meaning-altering and meaning-preserving graph transformations. We show the benefits of BAMBOO by profiling previous metrics and our own metrics. Results indicate that our novel metrics may serve as a strong baseline for future work.
翻译:在这项工作中,我们提出了新的Weisfeiler-Leman AMR相似指标,以统一先前指标的长处,同时减轻其弱点。具体地说,我们的新指标能够匹配背景化的子结构,并引出其节点之间的n:m调整。此外,我们采用了基于Opt目标(BAMBO)的AMBO模型基准,这是支持对基于图形的MR相似度指标进行实证评估的第一个基准。BAMBO的结果表明,我们的新衡量标准可用于确定从判决相似性目标到压力测试等多种公开性目标,以探究指标的稳健性,防止意义改变和保留值的图形变化。我们通过对先前指标和我们自己的衡量标准进行定性,显示BAMBOO的效益。