Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by up to 70\%. In addition, our attention based feature propagation method (ILA) outperforms prior work in terms of task accuracy while also reducing up to 90\% of FLOPs.
翻译:多任务学习的先前工作主要侧重于对单一图像的预测。 在这项工作中,我们提出了一个通过高效的跨框架地方关注(MILA)从视频中学习多任务的新办法。我们的方法包含一个全新的跨框架关注模块,可以学习跨框架的特定关注。我们将关注模块嵌入“慢任务”的架构中,在这个架构中,较慢的网络运行于稀有抽样键框架,轻量浅网络运行于非关键框架,且框架率高。我们还提出了一个有效的对抗性学习战略,鼓励缓慢和快速的网络学习类似特征。我们的方法确保低远程多任务学习,同时保持高质量的预测。实验显示与两个多任务学习基准的最新水平相比具有竞争力的准确性,同时将浮动点操作的数量减少至70 ⁇ 。此外,我们基于关注的特征传播方法(ILA)在任务准确性方面比先前的工作要快,同时将FLOP减少到90 ⁇ 。