Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks. Correspondingly, we propose a general multi-task oriented generative modeling (MGM) framework, by coupling a discriminative multi-task network with a generative network. While it is challenging to synthesize both RGB images and pixel-level annotations in multi-task scenarios, our framework enables us to use synthesized images paired with only weak annotations (i.e., image-level scene labels) to facilitate multiple visual tasks. Experimental evaluation on challenging multi-task benchmarks, including NYUv2 and Taskonomy, demonstrates that our MGM framework improves the performance of all the tasks by large margins, consistently outperforming state-of-the-art multi-task approaches.
翻译:生成模型最近在计算机视觉方面显示出巨大的希望,但它主要侧重于综合视觉现实图像。 在本文中,基于多任务共享特征演示的多任务学习,我们认为学习一个在各种视觉认知任务中有用的共同基因模型是一个新问题。相应的,我们提出一个通用的多任务导向基因模型框架,将歧视性的多任务网络与基因网络结合起来。虽然在多任务情景中合成 RGB 图像和像素级说明具有挑战性,但我们的框架使我们能够使用合成图像,同时只配有微弱的说明(即图像级场景标签)来便利多重视觉任务。关于挑战性多任务基准的实验性评估,包括NYUv2和任务感学,表明我们的MGM框架改进了大利润率、持续优于最先进的多任务方法的所有任务的绩效。