Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitable for low-resource settings. We present ETC-NLG, an approach leveraging topic modeling annotations to enable fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation over emergent topics in unlabeled document collections. We first test the effectiveness of our approach in a low-resource setting for Italian, evaluating the conditioning for both topic models and gold annotations. We then perform a comparative evaluation of ETC-NLG for Italian and English using a parallel corpus. Finally, we propose an automatic approach to estimate the effectiveness of conditioning on the generated utterances.
翻译:插图和剧本语言模型(PPLMs)通过将大型经过预先训练的发电机配对成属性模型来引导预计的象征性分布到选定的专题上,使专题性自然语言生成成为符合专题性条件的自然语言。尽管PPLMs具有计算效率,但需要大量贴有标签的文字来有效平衡生成的流畅性和适当的调适性,使其不适合于低资源环境。我们介绍了ETC-NLG, 这是一种利用专题模型说明来利用完全不受监督的终端到终端专题性自然语言生成在未加标签的文件收藏中超越新出现专题的方法。我们首先在意大利的低资源环境中测试我们的方法的有效性,评估专题模型和黄金说明的调适性。然后我们用一个平行的体对意大利语和英语的ECT-NLG进行比较评估。 最后,我们提出一个自动方法来估计对生成的语音的调节效果。