To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.
翻译:为了在视频中跟踪目标,当前视觉跟踪者通常对每个框架的目标对象定位进行贪婪的搜索,也就是说,每个框架的跟踪结果将选择具有最大响应分的候选区域,作为每个框架的跟踪结果。然而,我们发现,这也许不是最佳选择,特别是在遇到具有挑战性的跟踪情景时,例如严重隔离和快速运动。为了解决这一问题,我们建议保持多轨跟踪,并应用光子搜索策略进行视觉跟踪,以便识别累积错误较少的轨迹。因此,本文件引入了一种新的多试样强化学习,基于波马搜索跟踪战略,称为BaamTracking。这主要受图像说明任务的启发,将图像作为输入,并利用光子搜索算法生成不同的描述。因此,我们将跟踪作为通过多个平行决策进程完成的样本选择问题,每个进程都旨在选择一个样本作为每个框架的跟踪结果。每个保持的轨迹都与一个执行决策的代理相联系,并确定应采取哪些行动来更新相关信息。当所有框架都处理时,我们选择以最大效果跟踪进度的进度,然后根据最大进度进行。