Autonomous agents, such as driverless cars, require large amounts of labeled visual data for their training. A viable approach for acquiring such data is training a generative model with collected real data, and then augmenting the collected real dataset with synthetic images from the model, generated with control of the scene layout and ground truth labeling. In this paper we propose Full-Glow, a fully conditional Glow-based architecture for generating plausible and realistic images of novel street scenes given a semantic segmentation map indicating the scene layout. Benchmark comparisons show our model to outperform recent works in terms of the semantic segmentation performance of a pretrained PSPNet. This indicates that images from our model are, to a higher degree than from other models, similar to real images of the same kinds of scenes and objects, making them suitable as training data for a visual semantic segmentation or object recognition system.
翻译:无司机汽车等自主代理商的培训需要大量贴有标签的视觉数据。获取这些数据的一个可行办法是用收集到的真实数据来培训一种基因模型,然后用模型合成图像来补充收集到的真实数据集,这些图像是用模型的合成图像生成的,通过控制场景布局和地面真相标签生成的。在本文中,我们提议一个完全有条件的Glow结构,即完全有条件的Glow结构,用于生成新颖街道景象的可信和现实图像,并配有显示场景布局的语义分割图。基准比较表明,我们的模型在事先训练过的PSPNet的语义分割性表现方面比最近的工作要优。这表明,我们的模型图像与其他模型相比,在更高程度上类似于同类场景和物体的真实图像,因此适合用于视觉语义分割或对象识别系统的培训数据。