学习为视频对象分割而学习更好学习 (Learning to Learn Better for Video Object Segmentation)

Recently, the joint learning framework (JOINT) integrates matching based transductive reasoning and online inductive learning to achieve accurate and robust semi-supervised video object segmentation (SVOS). However, using the mask embedding as the label to guide the generation of target features in the two branches may result in inadequate target representation and degrade the performance. Besides, how to reasonably fuse the target features in the two different branches rather than simply adding them together to avoid the adverse effect of one dominant branch has not been investigated. In this paper, we propose a novel framework that emphasizes Learning to Learn Better (LLB) target features for SVOS, termed LLB, where we design the discriminative label generation module (DLGM) and the adaptive fusion module to address these issues. Technically, the DLGM takes the background-filtered frame instead of the target mask as input and adopts a lightweight encoder to generate the target features, which serves as the label of the online few-shot learner and the value of the decoder in the transformer to guide the two branches to learn more discriminative target representation. The adaptive fusion module maintains a learnable gate for each branch, which reweighs the element-wise feature representation and allows an adaptive amount of target information in each branch flowing to the fused target feature, thus preventing one branch from being dominant and making the target feature more robust to distractor. Extensive experiments on public benchmarks show that our proposed LLB method achieves state-of-the-art performance.

翻译：最近,联合学习框架(JOINT)整合了基于匹配的感知推理和在线感知学习,以实现准确和稳健的半监督视频对象分割(SVOS)。然而,使用掩罩嵌作为标签来指导两个分支的目标特性的生成,可能会造成目标代表性不足和性能下降。此外,如何合理整合两个不同分支的目标特征,而不是简单地将其合并以避免一个主导分支的不利影响,尚未对此进行调查。在本文件中,我们提议了一个新颖的框架,强调学习学习更好(LLLB),以达到SVOS(称为LLAB)的目标特征,我们在这里设计了具有歧视性的标签生成模块(DLGM)和适应性融合模块来指导这两个分支解决这些问题。技术上,DLGM采用背景化框架而不是目标面具作为投入,采用轻量的编码来生成目标特征,作为在线少发学习者的标签,以及变异器的价值,以指导两个分支学习更具歧视性的目标表示,即LLAB,我们设计了具有弹性的标签的生成模块,从而使得每个目标结构的变异性结构能够将一个驱动性指标显示一个驱动性结构的模型,从而显示一个驱动性结构的驱动性结构,从而显示一个驱动性结构的模型的版形结构结构结构显示一个驱动性结构结构结构结构结构结构显示一个驱动式结构结构显示一个驱动式结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构,从而显示一个驱动性结构,从而显示一个驱动性结构显示一个特性结构显示一个特性结构显示一个结构结构结构结构结构结构结构显示一个结构结构显示一个结构的调整性能的特性结构结构结构结构结构结构显示一个结构结构显示一个结构结构结构结构结构结构结构结构结构结构结构结构结构显示一个结构结构显示一个特性显示一个结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构显示一个特性显示一个特性显示一个特性显示一个结构显示一个特性显示一个结构结构显示一个结构显示一个特性显示一个特性显示一个特性显示一个特性显示一个特性显示一个特性显示一个特性显示一个结构结构结构结构结构结构结构结构结构显示一个结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构的