In this work we study user controlled table-to-text generation where users explore the content in a table by selecting cells and reading a natural language description thereof automatically produce by a natural language generator. Such generation models usually learn from carefully selected cell combinations (clean cell selections); however, in practice users may select unexpected, redundant, or incoherent cell combinations (noisy cell selections). In experiments, we find that models perform well on test sets coming from the same distribution as the train data but their performance drops when evaluated on realistic noisy user inputs. We propose a fine-tuning regime with additional user-simulated noisy cell selections. Models fine-tuned with the proposed regime gain 4.85 BLEU points on user noisy test cases and 1.4 on clean test cases; and achieve comparable state-of-the-art performance on the ToTTo dataset.
翻译:在这项工作中,我们研究用户控制的表格到文本生成,用户通过选择单元格和阅读自然语言生成器自动生成的表格内容来探索内容,这些生成模型通常从仔细选择的单元格组合(清洁细胞选择)中学习;然而,在实践中,用户可以选择意外、冗余或不一致的细胞组合(噪音细胞选择)。在实验中,我们发现模型在与火车数据相同分布的测试组上表现良好,但在根据现实的吵闹用户输入进行评价时,其性能下降。我们提出了一个微调制度,增加用户模拟的噪音细胞选择。模型与拟议系统相比,在用户噪音测试案例中获得4.85 BLEU点,在清洁测试案例中获得1.4个BLEU点;在ToTto数据集上实现可比的最新性能。