Maintaining the identity of multiple objects in real-time video is a challenging task, as it is not always possible to run a detector on every frame. Thus, motion estimation systems are often employed, which either do not scale well with the number of targets or produce features with limited semantic information. To solve the aforementioned problems and allow the tracking of dozens of arbitrary objects in real-time, we propose SiamMOTION. SiamMOTION includes a novel proposal engine that produces quality features through an attention mechanism and a region-of-interest extractor fed by an inertia module and powered by a feature pyramid network. Finally, the extracted tensors enter a comparison head that efficiently matches pairs of exemplars and search areas, generating quality predictions via a pairwise depthwise region proposal network and a multi-object penalization module. SiamMOTION has been validated on five public benchmarks, achieving leading performance against current state-of-the-art trackers.
翻译:在实时视频中维护多个对象的身份是一项艰巨的任务,因为并非总能在每一个框架上运行一个探测器。 因此,运动估计系统经常被使用,这些系统不是与目标数量不相称,就是产生有限的语义信息。为了解决上述问题,并能够实时跟踪数十个任意物体,我们提议Siammotion。 Siammotion 包含一个新颖的建议引擎,通过关注机制产生质量特征,一个由惯性模块和特质金字塔网络驱动的区域利益提取器。最后,提取的色龙进入一个比较头,有效地匹配模和搜索区域,通过双向深度区域建议网络和一个多点惩罚模块产生质量预测。 Siammotion在五个公共基准上得到了验证,实现了与当前状态的追踪器的领先业绩。