In this paper, we argue about the importance of considering task interactions at multiple scales when distilling task information in a multi-task learning setup. In contrast to common belief, we show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales, and vice versa. We propose a novel architecture, namely MTI-Net, that builds upon this finding in three ways. First, it explicitly models task interactions at every scale via a multi-scale multi-modal distillation unit. Second, it propagates distilled task information from lower to higher scales via a feature propagation module. Third, it aggregates the refined task features from all scales via a feature aggregation unit to produce the final per-task predictions. Extensive experiments on two multi-task dense labeling datasets show that, unlike prior work, our multi-task model delivers on the full potential of multi-task learning, that is, smaller memory footprint, reduced number of calculations, and better performance w.r.t. single-task learning.
翻译:在本文中,我们争论了在多任务学习设置中提取任务信息时考虑任务互动的多重尺度的重要性。 与共同的信念相反, 我们表明, 某些规模的高度亲和任务无法保证在其他尺度上保留这种行为, 反之亦然。 我们提议了一个以这一发现为基础的新结构, 即 MTI- Net, 以三种方式为基础。 首先, 它通过一个多规模的多模式蒸馏器, 明确模拟每个规模的任务互动。 其次, 它通过一个特征传播模块, 从一个低尺度到更高尺度上传播任务信息。 第三, 它通过一个特性汇总单元, 从所有尺度上汇总精细的任务特征, 以产生最终的每个任务预测。 关于两个多任务密集的标签数据集的广泛实验表明, 与先前的工作不同, 我们的多任务模型可以提供多任务学习的全部潜力, 即: 较小的记忆足迹、 减少计算数量, 以及更好的业绩 w.r. t. 单任务学习。