Run-on sentences are common grammatical mistakes but little research has tackled this problem to date. This work introduces two machine learning models to correct run-on sentences that outperform leading methods for related tasks, punctuation restoration and whole-sentence grammatical error correction. Due to the limited annotated data for this error, we experiment with artificially generating training data from clean newswire text. Our findings suggest artificial training data is viable for this task. We discuss implications for correcting run-ons and other types of mistakes that have low coverage in error-annotated corpora.
翻译:运行中的句子是常见的语法错误,但迄今没有多少研究来解决这个问题。 这项工作引入了两种机器学习模型来纠正运行中的句子,这些句子优于相关任务的主要方法,即标点恢复和整句语法错误纠正。 由于这一错误的附加说明数据有限,我们实验了从清洁新闻线文字中人工生成培训数据。 我们的研究结果表明人工培训数据对于这项任务是可行的。 我们讨论了纠正运行中句子和其他类型错误的影响,这些错误注释表单覆盖率低。