DeMT:用于多任务学习高密度预测的变形混合变异器变异器 (DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction)

Convolution neural networks (CNNs) and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL). Most of the current studies on MTL solely rely on CNN or Transformer. In this work, we present a novel MTL model by combining both merits of deformable CNN and query-based Transformer for multi-task learning of dense prediction. Our method, named DeMT, is based on a simple and effective encoder-decoder architecture (i.e., deformable mixer encoder and task-aware transformer decoder). First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels ($i.e.,$ efficient channel location mixing), and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations (i.e., deformed features). Second, the task-aware transformer decoder consists of the task interaction block and task query block. The former is applied to capture task interaction features via self-attention. The latter leverages the deformed features and task-interacted features to generate the corresponding task-specific feature through a query-based Transformer for corresponding task predictions. Extensive experiments on two dense image prediction datasets, NYUD-v2 and PASCAL-Context, demonstrate that our model uses fewer GFLOPs and significantly outperforms current Transformer- and CNN-based competitive models on a variety of metrics. The code are available at https://github.com/yangyangxu0/DeMT .

翻译：在多任务学习(MTL)中,共变神经网络(CNNs)和变异器都有其自身的优势,而且两者都被广泛用于密集预测。MTL目前的大多数研究都完全依赖CNN或变异器。在这项工作中,我们展示了一个新的MTL模型,将可变形CNN和基于查询的变异器的优点结合起来,用于多任务预测的多任务学习。我们称为DeMTT的方法基于一个简单而有效的编码-变异器结构(即变异变异变异变异变异变异器编码器和任务感变异变变变变变变器调器)。首先,变形变异变异变异变异变异变异的混合器包含两类操作者:变异变异变异变异变异变的混合操作者,用于不同渠道之间的通信(e.e.e,$高效的频道定位位置混合),以及空间变异变变变变变变变变变变变变的操作者,用于高效抽样空间地点(e.eforford)的变异变异变异变异变异变变变变变变变异变变变变变变变变异变异变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变变器,由我们的变变变变现的变变变变的变变变变变变变变变变变变变变变变变的变变变变变变换变变变变变变变变变变变变变变变变变变变变变变变的变变变变变变变变变换的变变变变变变的变的变变变变变的变变变变的变变变变变变变变变变变的变的变的变变的变的变变的变的变的变变变变的变的变变变变变变变变变变变变变变的变变变变变变变的变的变的变的变的变的变的变变变的变的变的变变变变变变变变变变的变的变变变变变变变变变变变变变变