This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Firstly, DeAOT decouples the hierarchical propagation of object-agnostic and object-specific embeddings by handling them in two independent branches. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention. Extensive experiments show that DeAOT significantly outperforms AOT in both accuracy and efficiency. On YouTube-VOS, DeAOT can achieve 86.0% at 22.4fps and 82.0% at 53.4fps. Without test-time augmentations, we achieve new state-of-the-art performance on four benchmarks, i.e., YouTube-VOS (86.2%), DAVIS 2017 (86.2%), DAVIS 2016 (92.9%), and VOT 2020 (0.622). Project page: https://github.com/z-x-yang/AOT.
翻译:本文侧重于为半监督的视频对象分割(VOS)开发一种更有效的等级传播方法。 根据视觉变压器,最近开发的具有变异器(AOT)的关联对象(AOT)方法将等级传播引入VOS,并展示了有希望的结果。等级传播可以逐渐将信息从过去框架传播到当前框架,并将当前框架特性从对象的认知性转移到特定对象。然而,特定对象信息的增加将不可避免地导致在深层传播层丢失目标不可知的视觉信息。为了解决这一问题并进一步促进视觉嵌入的学习,本文建议了一种在高级推进器(AOOT)中解析功能的方法。首先, DAOT decouple 传播到当前框架,通过在两个独立的分支中处理它们。第二,为了补偿从二元系统传播中增加的计算,我们提议了一个高效的模块,即Gated Propagation模块,该模块在单一的OODOVOOOVO(OO)中,在单一的OEOVOD(OI)OOOOOOOD(O)OOOOOD(OD)ODODOD(OD)(OI)中,在22次的测试中,在OIOLOLOLOLOLOD(OIOD)实验中实现显著的精确性测试(OLOLOD)(OD)和DOIOLOIOIOIOD-OD-OD-OD-OD),在不进行大幅)实验中,在不作。