Multi-task learning (MTL) paradigm focuses on jointly learning two or more tasks, aiming for significant improvement w.r.t model's generalizability, performance, and training/inference memory footprint. The aforementioned benefits become ever so indispensable in the case of joint training for vision-related {\bf dense} prediction tasks. In this work, we tackle the MTL problem of two dense tasks, i.e., semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module ({CCAM}), which facilitates effective feature sharing along each channel between the two tasks, leading to mutual performance gain with a negligible increase in trainable parameters. In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called {AffineMix}, and a simple depth augmentation using predicted semantics called {ColorAug}. Finally, we validate the performance gain of the proposed method on the Cityscapes and ScanNet dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic segmentation.
翻译:多任务学习(MTL)模式侧重于共同学习两个或两个以上的任务,目的是显著改进W.r.t模型的一般性、性能和培训/推算记忆足迹。上述好处在与愿景有关的预测任务联合培训 {bf 密度大} 的情况下变得非常不可或缺。 在这项工作中,我们处理两种密集任务(即语义分解和深度估计)的MTL问题,并推出一个新颖的关注模块,称为跨通道注意模块({CCAM}),该模块将促进两个任务之间在每一个频道上有效分享特征,从而在可培训参数中以微不足道的增加实现相互性能增益。在真正的共生精神下,我们然后用预测深度({AffineMix})和简单的深度增强,使用预测的语义学({ColorAug})。最后,我们验证了拟议方法在市景和扫描网数据集上的业绩收益,这帮助我们在半封闭深度联合模型上实现状态-艺术结果。