Neural table-to-text generation approaches are data-hungry, limiting their adaptation for low-resource real-world applications. Previous works mostly resort to Pre-trained Language Models (PLMs) to generate fluent summaries of a table. However, they often contain hallucinated contents due to the uncontrolled nature of PLMs. Moreover, the topological differences between tables and sequences are rarely studied. Last but not least, fine-tuning on PLMs with a handful of instances may lead to over-fitting and catastrophic forgetting. To alleviate these problems, we propose a prompt-based approach, Prefix-Controlled Generator (i.e., PCG), for few-shot table-to-text generation. We prepend a task-specific prefix for a PLM to make the table structure better fit the pre-trained input. In addition, we generate an input-specific prefix to control the factual contents and word order of the generated text. Both automatic and human evaluations on different domains (humans, books and songs) of the Wikibio dataset show substantial improvements over baseline approaches.
翻译:生成神经表格到文字的方法是数据饥饿,限制了对低资源现实应用的适应性。以前的工作主要依靠预先培训的语言模型(PLM)来生成流利的表格摘要。然而,由于PLM的不受控制性质,这些模型往往含有幻觉内容。此外,很少研究表格和顺序之间的地形差异。最后但并非最不重要的一点是,对PLM进行微调,加上少数实例,可能会导致过度适应和灾难性的遗忘。为了缓解这些问题,我们建议采用基于迅速的方法,即先行控制的生成器(即PCG),用于几发式的表格到文字生成。我们为PLM设计了一个具体任务前缀,以使表格结构更适合预先培训的投入。此外,我们还制作了一种具体投入的前缀,以控制生成文本的事实内容和文字的文字顺序。对维基比奥数据集的不同领域(人类、书籍和歌曲)的自动和人类评价都显示基线方法的重大改进。