Make-A-故事:视觉记忆有条件的连续故事一代 (Make-A-Story: Visual Memory Conditioned Consistent Story Generation)

There has been a recent explosion of impressive generative models that can produce high quality images (or videos) conditioned on text descriptions. However, all such approaches rely on conditional sentences that contain unambiguous descriptions of scenes and main actors in them. Therefore employing such models for more complex task of story visualization, where naturally references and co-references exist, and one requires to reason about when to maintain consistency of actors and backgrounds across frames/scenes, and when not to, based on story progression, remains a challenge. In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed. To validate the effectiveness of our approach, we extend the MUGEN dataset and introduce additional characters, backgrounds and referencing in multi-sentence storylines. Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

翻译：最近出现了令人印象深刻的基因化模型的爆炸,这些模型能够产生高质量的图像(或视频),以文字描述为条件,然而,所有这类方法都依赖于含有清晰描述场景和其中主要行为者的有条件的句子。因此,利用这些模型进行更复杂的故事直观化任务,在这种任务中自然存在参考和共同参照和共同参照,人们需要了解何时保持跨框架/场景的行为者和背景的一致性,而如果不根据故事进展,则在多主题故事线中引入更多的字符、背景和参考,这仍然是一个挑战。在这项工作中,我们应对上述挑战,并提议一个具有视觉记忆模块的新颖的自动反向扩散框架,以隐含地捕捉生成的场景和背景背景背景。对记忆的软化关注使得能够有效的参考分辨率,并学会在必要时保持场景和演员的一致性。为了验证我们的方法的有效性,我们扩展MUGEN数据集,在多主题故事线上引入更多的字符、背景和参考。我们在MUGEN、PororoSV和FlinterstoneSV数据集上进行的故事制作实验,我们的方法不仅超越了我们的方法,而且不仅超越了先前的图像和图像的高级背景,而且与前的图像背景之间也保持了相同的格式。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日