The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).
翻译:共同评估准确性的任务侧重于评估神经神经导航定位系统在体育报告领域产生的文本的实际准确性的技术(人工和自动),四个小组使用非常不同的方法和技术提交了这项任务的评价技术,业绩最佳的提交材料在这项困难的任务中表现良好,令人鼓舞,然而,所有自动提交材料都努力找出在音义上或实际上复杂的事实错误(例如,基于不正确的计算或推断)。