The timeline of computer vision research is marked with advances in learning and utilizing efficient contextual representations. Most of them, however, are targeted at improving model performance on a single downstream task. We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads. Our goal is to find the most efficient way to refine each task prediction by capturing cross-task contexts dependent on tasks' relations. We explore various attention-based contexts, such as global and local, in the multi-task setting and analyze their behavior when applied to refine each task independently. Empirical findings confirm that different source-target task pairs benefit from different context types. To automate the selection process, we propose an Adaptive Task-Relational Context (ATRC) module, which samples the pool of all available contexts for each task pair using neural architecture search and outputs the optimal configuration for deployment. Our method achieves state-of-the-art performance on two important multi-task benchmarks, namely NYUD-v2 and PASCAL-Context. The proposed ATRC has a low computational toll and can be used as a drop-in refinement module for any supervised multi-task architecture.
翻译:计算机愿景研究的时间表具有学习进步和利用高效背景表现的标志性,但大多数目标是改进单一下游任务的模型绩效。我们考虑以共同的骨干和独立的任务型头为代表的密集预测任务多任务环境。我们的目标是找到最有效的方法,通过捕捉取决于任务关系的跨任务背景来完善每项任务预测。我们探索多种任务设置中的各种关注背景,如全球和地方,多任务设置中的各种关注背景,并在独立应用来完善每项任务时分析它们的行为。经验性结论证实不同来源目标任务组合从不同背景类型中受益。为自动化选择过程,我们提议一个适应任务-关系背景模块,通过神经结构搜索和输出最佳部署配置,对每对任务组合的所有可用环境进行抽样。我们的方法在两个重要的多任务基准(即NYUD-v2和PACAL-Context)上实现最先进的业绩。拟议的ATRC有一个低的计算方向,可以用作任何监督的多任务级调整模块。