We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
翻译:我们建议一种方法来评价所产生的文本的质量,方法是请评价人员从原始数字中计算事实和准确性、回顾、细数和准确性,我们认为这种方法可以更客观和更容易地复制评价。 我们将此应用于医疗报告总结任务,其中衡量客观质量和准确性至关重要。