视觉物体跟踪的深谷投票 (Pyramid Correlation based Deep Hough Voting for Visual Object Tracking)

Most of the existing Siamese-based trackers treat tracking problem as a parallel task of classification and regression. However, some studies show that the sibling head structure could lead to suboptimal solutions during the network training. Through experiments we find that, without regression, the performance could be equally promising as long as we delicately design the network to suit the training objective. We introduce a novel voting-based classification-only tracking algorithm named Pyramid Correlation based Deep Hough Voting (short for PCDHV), to jointly locate the top-left and bottom-right corners of the target. Specifically we innovatively construct a Pyramid Correlation module to equip the embedded feature with fine-grained local structures and global spatial contexts; The elaborately designed Deep Hough Voting module further take over, integrating long-range dependencies of pixels to perceive corners; In addition, the prevalent discretization gap is simply yet effectively alleviated by increasing the spatial resolution of the feature maps while exploiting channel-space relationships. The algorithm is general, robust and simple. We demonstrate the effectiveness of the module through a series of ablation experiments. Without bells and whistles, our tracker achieves better or comparable performance to the SOTA algorithms on three challenging benchmarks (TrackingNet, GOT-10k and LaSOT) while running at a real-time speed of 80 FPS. Codes and models will be released.

翻译：大多数现有的以暹罗为基地的跟踪者将追踪问题作为平行的分类和回归任务处理。然而,一些研究表明,在网络培训期间,顶部结构可能会导致不优化的解决方案。通过实验我们发现,只要我们细微地设计网络以适应培训目标,业绩同样充满希望,只要我们不倒退,只要我们精细地设计网络,以适应培训目标;我们引入了一种新型的基于投票的、只有分类的追踪算法,名为基于深喉投票的“金字塔关系”(对PCDHV来说很短),以联合定位目标的左上角和右下角。具体地说,我们创新地构建了金字塔火化模块,以精细微的本地结构和全球空间环境环境环境环境环境环境;我们发现,只要我们精心设计的深哈夫投票模块,只要我们细微地覆盖网络,把长距离的像素依赖感应角感;此外,由于在利用频道-空间关系,提高地貌地图的空间分辨率分辨率,普遍的差距就得到切实缓解。算法是一般的、稳健和简单的。我们通过一系列具有挑战性的业绩和可变式的模型来展示模型运行模式,在运行上展示模型上展示模型上运行模式,将达到具有80-10级的轨道上,将实现。