Maintaining the identity of multiple objects in real-time video is a challenging task, as it is not always feasible to run a detector on every frame. Thus, motion estimation systems are often employed, which either do not scale well with the number of targets or produce features with limited semantic information. To solve the aforementioned problems and allow the tracking of dozens of arbitrary objects in real-time, we propose SiamMOTION. SiamMOTION includes a novel proposal engine that produces quality features through an attention mechanism and a region-of-interest extractor fed by an inertia module and powered by a feature pyramid network. Finally, the extracted tensors enter a comparison head that efficiently matches pairs of exemplars and search areas, generating quality predictions via a pairwise depthwise region proposal network and a multi-object penalization module. SiamMOTION has been validated on five public benchmarks, achieving leading performance against current state-of-the-art trackers. Code available at: https://github.com/lorenzovaquero/SiamMOTION
翻译:保持实时视频中多个对象的特性是一项艰巨的任务,因为在每个框架运行一个探测器并不总是可行,因此,经常使用运动估计系统,这些系统不是与目标数量不相称,就是产生有限的语义信息。为了解决上述问题,并能够实时跟踪数十个任意物体,我们提议Siammotion。siammotion包含一个新颖的建议引擎,通过关注机制产生质量特征,通过一个惯性模块提供一个区域利益提取器,并由一个特质金字塔网络提供动力。最后,提取的Exronors进入一个比较头,有效地匹配模和搜索区域,通过双向深度区域建议网络和一个多点惩罚模块产生质量预测。 Siammotion已经根据五个公共基准得到验证,在当前的状态-艺术跟踪器上取得了领先性业绩。代码见:https://github.com/lorenzovaquero/siammotion。