Recent advancements in data-to-text generation largely take on the form of neural end-to-end systems. Efforts have been dedicated to improving text generation systems by changing the order of training samples in a process known as curriculum learning. Past research on sequence-to-sequence learning showed that curriculum learning helps to improve both the performance and convergence speed. In this work, we delve into the same idea surrounding the training samples consisting of structured data and text pairs, where at each update, the curriculum framework selects training samples based on the model's competence. Specifically, we experiment with various difficulty metrics and put forward a soft edit distance metric for ranking training samples. Our benchmarks show faster convergence speed where training time is reduced by 38.7% and performance is boosted by 4.84 BLEU.
翻译:最近数据到文字生成的进展主要表现在神经终端到终端系统的形式上。已经作出努力,通过在被称为课程学习的过程改变培训样本的顺序来改进文本生成系统。以往关于顺序到顺序学习的研究表明,课程学习有助于提高性能和趋同速度。在这项工作中,我们探讨了由结构化数据和文本配对组成的培训样本的相同想法,在每次更新时,课程框架都根据模型的能力选择培训样本。具体地说,我们试验了各种困难度量度,并为排名培训样本提出了一个软编辑远程测量标准。我们的基准显示,在培训时间减少38.7%和业绩提高4.84 BLEU的情况下,培训时间减少38.7%,业绩提高4.84 BLEU。