显示和写:实体了解的有图像信息的新闻生成 (Show and Write: Entity-aware News Generation with Image Information)

Automatically writing long articles is a complex and challenging language generation task. Prior work has primarily focused on generating these articles using human-written prompt to provide some topical context and some metadata about the article. That said, for many applications, such as generating news stories, these articles are often paired with images and their captions or alt-text, which in turn are based on real-world events and may reference many different named entities that are difficult to be correctly recognized and predicted by language models. To address these two problems, this paper introduces an Entity-aware News Generation method with Image iNformation, Engin, to incorporate news image information into language models. Engin produces news articles conditioned on both metadata and information such as captions and named entities extracted from images. We also propose an Entity-aware mechanism to help our model better recognize and predict the entity names in news. We perform experiments on two public large-scale news datasets, GoodNews and VisualNews. Quantitative results show that our approach improves article perplexity by 4-5 points over the base models. Qualitative results demonstrate the text generated by Engin is more consistent with news images. We also perform article quality annotation experiment on the generated articles to validate that our model produces higher-quality articles. Finally, we investigate the effect Engin has on methods that automatically detect machine-generated articles.

翻译：自动撰写长篇文章是一项复杂且具有挑战性的语言生成任务。先前的工作主要侧重于利用人文快速生成这些文章,以提供一些专题背景和文章的某些元数据。也就是说,对于许多应用程序,例如制作新闻报道,这些文章往往配有图像及其标题或可变文本,而后者又以真实世界事件为基础,并可能提及许多难以被语言模型正确识别和预测的不同名称实体。为解决这两个问题,本文件引入了一个实体认知新闻生成方法,其中含有图像 image imnformation, Engin, 将新闻图像信息信息纳入语言模型。以元数据和信息为条件,例如标题和从图像中提取的命名实体等, Enteraware 制作了新闻文章。我们还建议一个实体认知机制,以帮助我们的模型更好地识别和预测新闻中的实体名称。我们对两个公共大型新闻数据集,即GoodNews和视觉新闻模型进行实验。定量结果显示,我们的方法在基础模型上增加了4-5点的不易理解性。定性结果显示, Engin 显示Engin 生成的文本与我们制作的高级文章的测试方法更加一致。