We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.
翻译:我们观察到对自然语言生成系统所造成各种错误的报告严重不足,这是一个问题,因为错误是说明哪些系统仍需改进的一个重要指标,如果作者只报告总体业绩衡量标准,研究界就会对“最新”研究所显示的具体弱点置之不理,除了对错误报告不足的程度进行量化外,本立场文件还就错误的识别、分析和报告提出建议。