To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility. The guidelines we propose are extremely simple and center around transparency regarding how chunks are encoded and scored. We demonstrate that despite the apparent simplicity of NER evaluation, unreported differences in the scoring procedure can result in changes to scores that are both of noticeable magnitude and statistically significant. We describe SeqScore, which addresses many of the issues that cause replication failures.
翻译:为了应对即将来临的对名称实体确认进行不可复制评价的危机,我们提出了指导方针,并引入了SeqScore软件包,以改进复制。我们提出的指导方针非常简单,并且围绕块的编码和分数的透明度。我们证明,尽管NER评价明显简单,但未报告的评分程序差异可能导致得分变化,而得分既明显大又具有统计意义。我们描述了SeqScore,它解决了许多导致复制失败的问题。