Story Visualization is an advanced task of computed vision that targets sequential image synthesis, where the generated samples need to be realistic, faithful to their conditioning and sequentially consistent. Our work proposes a novel architectural and training approach: the Impartial Transformer achieves both text-relevant plausible scenes and sequential consistency utilizing as few trainable parameters as possible. This enhancement is even able to handle synthesis of 'hard' samples with occluded objects, achieving improved evaluation metrics comparing to past approaches.
翻译:Story 可视化是计算愿景的高级任务,它针对的是连续图像合成,生成的样本需要现实、忠实于其调节和顺序一致。 我们的工作提出了一个新的建筑和培训方法:公正变换器利用尽可能少的可培训参数实现与文本相关的貌似情景和顺序一致性。 这一改进甚至能够处理“硬”样本与隐蔽物体的合成,实现与以往方法相比较的改进评价指标。