We present a new semi-supervised video object segmentation framework that can process multiple objects in a single network pass and has a dynamically scalable architecture for speed-accuracy trade-offs. State-of-the-art methods prefer to match and segment a single positive object and have to process objects one by one under multi-object scenarios, consuming multiple times of computation resources. Besides, previous methods always have static network architectures, which are not flexible enough to adapt to different speed-accuracy requirements. To solve the above problems, we proposed an Associating Objects with Scalable Transformers (AOST) approach to match and segment multiple objects collaboratively with online network scalability. To match and segment multiple objects as efficiently as processing a single one, AOST employs an IDentification (ID) mechanism to assign objects with unique identities and associate them in a shared high-dimensional embedding space. In addition, a Scalable Long Short-Term Transformer (S-LSTT) is designed to construct hierarchical multi-object associations and enable online adaptation of accuracy-efficiency trade-offs. By further introducing scalable supervision and layer-wise ID-based attention, AOST is not only more flexible but more robust than previous methods. We conduct extensive experiments on multi-object and single-object benchmarks to evaluate AOST variants. Compared to state-of-the-art competitors, our methods can maintain superior run-time efficiency with better performance. Notably, we achieve new state-of-the-art performance on popular VOS benchmarks, i.e., YouTube-VOS (86.5%), DAVIS 2017 Val/Test (87.0%/84.7%), and DAVIS 2016 (93.0%). Project page: https://github.com/z-x-yang/AOT.
翻译:我们提出了一个新的半监督的视频目标分割框架, 该框架可以处理单一网络通道中的多个对象, 并且有一个动态可缩放的结构, 用于速度- 准确交易。 最先进的方法更愿意匹配和分割一个单一正对象, 并且不得不在多目标假设情景下逐个处理一个物体, 消耗多种时间的计算资源。 此外, 以往的方法总是有静态的网络结构, 这些结构不够灵活, 无法适应不同速度- 准确性要求。 为了解决上述问题, 我们建议采用一个具有可缩放变换器( AOST) 的将多个对象与可缩放变换器( AOST) 的方法, 以在线网络可缩放的方式匹配和分割多个对象。 通过进一步引入可缩放的 Outrial 性能( IID) 机制, 指定具有独特特性的物体并将其连接到一个共同的高维度嵌入空间。 此外, 一个可缩放的长短期变换器( S- LSLSTTT) 旨在构建等级性多位化的多点协会,, 并能够在线调整准确性交易。 。 通过进一步引入一个可变化的透明性能- Oloveal- staryal- deal- develyalalalalal- deal- deal- dal- deal- deviewdal- disal- disal- disal- disal- disal- disal- disal- disal- laisal- adal- ad- adal- ladal- lades todal- adal- adal- lad- lad- lad- ad- lad- lad- ad- lad- lad- lad- ad- lad- ad- ad- ad- ladal- ladal- ladal- ladal- ladal- lad- lad- ladal- lad- lad- lad- lad- lad- lad- lad- lad- a- a- a- a- a- a- a- lad