Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction. Most existing works encounter a severe limitation of modeling in the locality due to heavy utilization of convolution operations, while learning interactions and inference in a global spatial-position and multi-task context is critical for this problem. In this paper, we propose a novel end-to-end Inverted Pyramid multi-task Transformer (InvPT) to perform simultaneous modeling of spatial positions and multiple tasks in a unified framework. To the best of our knowledge, this is the first work that explores designing a transformer structure for multi-task dense prediction for scene understanding. Besides, it is widely demonstrated that a higher spatial resolution is remarkably beneficial for dense predictions, while it is very challenging for existing transformers to go deeper with higher resolutions due to huge complexity to large spatial size. InvPT presents an efficient UP-Transformer block to learn multi-task feature interaction at gradually increased resolutions, which also incorporates effective self-attention message passing and multi-scale feature aggregation to produce task-specific prediction at a high resolution. Our method achieves superior multi-task performance on NYUD-v2 and PASCAL-Context datasets respectively, and significantly outperforms previous state-of-the-arts. The code is available at https://github.com/prismformore/InvPT
翻译:多任务密密的场景理解是一个蓬勃发展的研究领域,需要同时认识和推理一系列具有像素预测的关联任务。大多数现有工作都由于大量利用卷发作业而严重限制了当地模型的建模,而在全球空间定位和多任务背景下学习互动和推论对于这一问题至关重要。在本文件中,我们提出一个新的端至端自倒的Pyramid多任务变异器(InvTPT),以便在一个统一的框架中同时建模空间位置和多重任务。据我们所知,这是首次探索如何设计一个变异器结构,用于多任务密集的预测以了解场景。此外,广泛显示更高的空间分辨率对密集预测非常有益,而由于空间规模庞大,对现有变异器的分辨率越深越高,则非常困难。 InvPTPT提供高效的UP-Transtrefer块,以在逐渐增加的分辨率中学习多任务立方特征互动,这也包含有效的自我保存信息传递和多任务密集的变异特性图集,我们在高分辨率上分别实现高分辨率的PA-CSA-C-C-C-C-SAL-FSDSBSDSDSDS-S-S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-D-C-C-C-C-C-C-C-C-C-C-C-C-C-C-