This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems that are additionally enriched by a data augmentation or a pseudo-labeling semi-supervised learning approach. Results show that semi-supervised learning results in higher scores on diversity metrics. In terms of output quality, extending the training set of a data-to-text system with a language model using the pseudo-labeling approach did increase text quality scores, but the data augmentation approach yielded similar scores to the system without training set extension. These results indicate that semi-supervised learning approaches can bolster output quality and diversity, even when a language model is also present.
翻译:本研究报告讨论了半监督学习与预先培训的语言模型相结合,用于数据到文字生成的半监督学习的效果。当大规模语言模型也得到补充时,半监督学习是否仍然有用,还不清楚半监督学习是否仍然有用。本研究报告的目的是通过将数据到文字系统与两个数据到文字系统进行比较来回答这个问题,而数据增加或假标签的半监督学习方法又补充了两个数据到文字系统。结果显示,半监督学习在多样性指标上取得了更高的分数。 在产出质量方面,扩大数据到文字系统的培训组,使用假标签方法使用一种语言模型,确实提高了文本质量分数,但数据增加方法在没有培训设置扩展的情况下产生了与系统相似的分数。这些结果表明,半监督学习方法可以提高产出质量和多样性,即使有语言模型。