a 为简要评价寻找平衡的自动化程度 (Finding a Balanced Degree of Automation for Summary Evaluation)

Human evaluation for summarization tasks is reliable but brings in issues of reproducibility and high costs. Automatic metrics are cheap and reproducible but sometimes poorly correlated with human judgment. In this work, we propose flexible semiautomatic to automatic summary evaluation metrics, following the Pyramid human evaluation method. Semi-automatic Lite2Pyramid retains the reusable human-labeled Summary Content Units (SCUs) for reference(s) but replaces the manual work of judging SCUs' presence in system summaries with a natural language inference (NLI) model. Fully automatic Lite3Pyramid further substitutes SCUs with automatically extracted Semantic Triplet Units (STUs) via a semantic role labeling (SRL) model. Finally, we propose in-between metrics, Lite2.xPyramid, where we use a simple regressor to predict how well the STUs can simulate SCUs and retain SCUs that are more difficult to simulate, which provides a smooth transition and balance between automation and manual evaluation. Comparing to 15 existing metrics, we evaluate human-metric correlations on 3 existing meta-evaluation datasets and our newly-collected PyrXSum (with 100/10 XSum examples/systems). It shows that Lite2Pyramid consistently has the best summary-level correlations; Lite3Pyramid works better than or comparable to other automatic metrics; Lite2.xPyramid trades off small correlation drops for larger manual effort reduction, which can reduce costs for future data collection. Our code and data are publicly available at: https://github.com/ZhangShiyue/Lite2-3Pyramid

翻译：用于总和任务的人类评价是可靠的,但会带来重现和高成本的问题。自动衡量标准是廉价的, 可复制的, 但有时与人类判断不相干。在这项工作中, 我们提议采用金字塔人的评价方法, 自动半自动自动自动地自动进行简要评价指标。半自动利他2金字塔保留可再使用的人类标签摘要内容单位( SCU) 用于参考, 却用自然语言的推断( NLI) 模型取代在系统摘要中判断 SCUs 存在的手工工作。完全自动利特3Zyramid进一步用自动提取的Smantic Triplet 单位( STUs) 取代S 。最后, 我们建议使用简单手动的回归器来预测 SCTUs 如何模拟 SUB/ Prdrim 的代码, 并保留更难模拟的 SCUCUS, 提供自动化和手动评估之间的平稳过渡和平衡。比较到15 里德萨勒3, 我们不断的里萨平级数据采集数据, 数据系统在公开的比。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日