Although the well-known MR-to-text E2E dataset has been used by many researchers, its MR-text pairs include many deletion/insertion/substitution errors. Since such errors affect the quality of MR-to-text systems, they must be fixed as much as possible. Therefore, we developed a refined dataset and some python programs that convert the original E2E dataset into a refined dataset.
翻译:虽然许多研究人员都使用了众所周知的 MR-to-text E2E 数据集,但其MR-text配对包含许多删除/插入/替代错误。由于这些错误影响到MR-text系统的质量,因此必须尽可能地固定这些错误。因此,我们开发了一个精细的数据集和一些将原始 E2E 数据集转换成精细数据集的皮松程序。