In multi-task learning (MTL) for visual scene understanding, it is crucial to transfer useful information between multiple tasks with minimal interferences. In this paper, we propose a novel architecture that effectively transfers informative features by applying the attention mechanism to the multi-scale features of the tasks. Since applying the attention module directly to all possible features in terms of scale and task requires a high complexity, we propose to apply the attention module sequentially for the task and scale. The cross-task attention module (CTAM) is first applied to facilitate the exchange of relevant information between the multiple task features of the same scale. The cross-scale attention module (CSAM) then aggregates useful information from feature maps at different resolutions in the same task. Also, we attempt to capture long range dependencies through the self-attention module in the feature extraction network. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the NYUD-v2 and PASCAL-Context dataset.
翻译:在多任务学习(MTL)中,为了了解视觉场景,必须在多个任务之间传递有用的信息,同时尽量减少干扰。在本文件中,我们提出一个新的结构,通过将关注机制应用于任务的多尺度特征,有效地传递信息特征。由于将关注模块直接应用于在规模和任务方面的所有可能的特征需要高度复杂,我们提议对任务和规模按顺序应用关注模块。跨任务关注模块(CTAM)首先用于促进同一规模的多个任务特征之间的相关信息交流。跨规模关注模块(CSAM)随后将不同分辨率的特征图提供的有用信息汇总到同一任务中。此外,我们还试图通过特征提取网络中的自留模块捕捉到长距离依赖性。广泛的实验表明,我们的方法在NUD-V2和PACAL-Cotext数据集上取得了最新业绩。