Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context. On the other hand, when generating text, human writers are gifted at creative visualization, which enhances their writings by forming imaginations as blueprints before putting down the stories in words. Inspired by such a cognitive process, we ask the natural question of whether we can endow machines with the same ability to utilize visual information and construct a general picture of the context to guide text generation. In this work, we propose iNLG that uses machine-generated images to guide language models in open-ended text generation. The experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks, including text completion, story generation, and concept-to-text generation in both few-shot and full-data scenarios. Both automatic metrics and human evaluations verify that the text snippets generated by our iNLG are coherent and informative while displaying minor degeneration.
翻译:文本到图像合成的最新进展使得能够将机器想象力想象到特定背景下。 另一方面,在生成文本时,人类作家具有创造性可视化的天赋,通过在以文字写出故事之前将想象力作为蓝图来增强他们的写作能力。在这种认知过程的启发下,我们问一个自然的问题,即我们是否能够将具有同样能力、能够利用视觉信息和构建总体背景图象的机器用于指导文本生成。在这项工作中,我们建议iNLG使用机器生成的图像来指导开放式文本生成中的语言模型。这些实验和分析展示了iNLG在开放式文本生成任务上的有效性,包括文本完成、故事生成和概念到文本生成,在几个镜头和完整数据情景中都是如此。两个自动指标和人类评估都证实我们的iNLG生成的文本片断在显示小变代时是连贯和信息的。