正在生成包含多个相协调对象的附加说明的高宽度图像 (Generating Annotated High-Fidelity Images Containing Multiple Coherent Objects)

Recent developments related to generative models have made it possible to generate diverse high-fidelity images. In particular, layout-to-image generation models have gained significant attention due to their capability to generate realistic complex images containing distinct objects. These models are generally conditioned on either semantic layouts or textual descriptions. However, unlike natural images, providing auxiliary information can be extremely hard in domains such as biomedical imaging and remote sensing. In this work, we propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring their contextual information during the generation process. Based on a vector-quantized variational autoencoder (VQ-VAE) backbone, our model learns to preserve spatial coherency within an image as well as semantic coherency between the objects and the background through two powerful autoregressive priors: PixelSNAIL and LayoutPixelSNAIL. While the PixelSNAIL learns the distribution of the latent encodings of the VQ-VAE, the LayoutPixelSNAIL is used to specifically learn the semantic distribution of the objects. An implicit advantage of our approach is that the generated samples are accompanied by object-level annotations. We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets; thereby outperforming state-of-the-art multi-object generative methods. The efficacy of our approach is demonstrated through application on medical imaging datasets, where we show that augmenting the training set with generated samples using our approach improves the performance of existing models.

翻译：与基因模型有关的近期发展使得能够生成多种高纤维化图像。特别是,由于布局到图像生成模型具有生成包含不同对象的现实复杂图像的能力,因此受到高度重视。这些模型通常以语义布局或文字描述为条件。然而,与自然图像不同,在生物医学成像和遥感等领域提供辅助信息可能极为困难。在这项工作中,我们提议了一个多对象生成框架,可以与多个对象合成图像,而无需在生成过程中明确要求其背景信息。以矢量定量变异变异器(VQ-VAE)主干线为基础,我们模型学会如何在图像中保持空间一致性以及对象和背景之间的语义一致性。然而,与自然图像不同的是,提供辅助信息在生物物理成象成像和布局PixelSNAIL等领域极为困难。 PixelSNAIL可以学习VQ- VAE的潜伏调控方法的分布,而BladPixel-SNAIL则用于具体学习图像应用的精度变异性应用, 以及我们数据分析中的精度分析方法。我们的精度的精度的精度的精度分析显示, 展示展示是我们目前数据的精度的精度, 。