We propose a unified look at jointly learning multiple vision tasks and visual domains through universal representations, a single deep neural network. Learning multiple problems simultaneously involves minimizing a weighted sum of multiple loss functions with different magnitudes and characteristics and thus results in unbalanced state of one loss dominating the optimization and poor results compared to learning a separate model for each problem. To this end, we propose distilling knowledge of multiple task/domain-specific networks into a single deep neural network after aligning its representations with the task/domain-specific ones through small capacity adapters. We rigorously show that universal representations achieve state-of-the-art performances in learning of multiple dense prediction problems in NYU-v2 and Cityscapes, multiple image classification problems from diverse domains in Visual Decathlon Dataset and cross-domain few-shot learning in MetaDataset. Finally we also conduct multiple analysis through ablation and qualitative studies.
翻译:我们建议通过一个单一的深度神经网络,统一研究共同学习多愿景任务和视觉领域,通过通用的表达方式,一个单一的深度神经网络。同时学习的多重问题涉及最大限度地减少具有不同大小和特点的多重损失功能的加权总和,从而导致一个损失状态不平衡,一个损失主导优化,结果差,而每个问题则学习一个单独的模式。为此,我们提议在通过小型能力适应器使其表达方式与任务/特定领域相一致之后,将多任务/特定领域网络的知识提炼成一个单一的深层神经网络。我们严格表明,在学习纽约U-V2和城市景区多重密集的预测问题时,在视觉的Decathlon数据集和MetaDataset的跨视距学中,普遍存在多图像分类问题。最后,我们还通过调节和定性研究进行多重分析。