Online tracking of multiple objects in videos requires strong capacity of modeling and matching object appearances. Previous methods for learning appearance embedding mostly rely on instance-level matching without considering the temporal continuity provided by videos. We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker. It enables us to learn not only from videos labeled with complete tracks, but also unlabeled or partially labeled videos. We implement this learning objective in a unified form following the spirit of constrastive loss. Experiments on multiple object tracking datasets demonstrate that our method can effectively learning discriminative appearance embeddings in a semi-supervised fashion and outperform state of the art methods on representative benchmarks.
翻译:对视频中多个对象进行在线跟踪需要很强的模型和匹配对象外观能力。 以往学习外观嵌入的方法主要依赖实例级匹配,而没有考虑视频提供的时间连续性。 我们设计了一个新的实例至轨匹配目标,以学习外观嵌入,将候选人探测与轨道嵌入轨迹的对比持续在跟踪器中。 它使我们能够不仅从贴有完整音轨标签的视频中学习,而且从未贴标签或部分贴标签的视频中学习。 我们按照压抑性损失的精神,以统一的形式实现这一学习目标。 对多个对象跟踪数据集的实验表明,我们的方法可以有效地学习半监督时装中的歧视外观嵌入,并超越代表性基准的艺术方法。