Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. An LSTM is used as a memory controller, where the input is the search feature map and the outputs are the control signals for the reading and writing process of the memory block. As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. Unlike tracking-by-detection methods where the object's information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target's appearance changes by updating the external memory. Moreover, the capacity of our model is not determined by the network size as with other trackers -- the capacity can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on OTB and VOT demonstrates that our tracker MemTrack performs favorably against state-of-the-art tracking methods while retaining real-time speed of 50 fps.
翻译:视觉追踪的模版匹配方法最近因其可比较性能和速度而越来越受欢迎。 但是,它们缺乏有效的方法来适应目标对象外观的变化,使得其跟踪准确性仍然远低于最新技术。 在本文中,我们提议建立一个动态的内存网络,使模板适应目标的外观变异。 LSTM 被用作记忆控制器, 其输入是搜索功能图, 输出是内存块读写过程的控制信号。 由于搜索功能图中最初不知道目标的位置, 应用了一个关注机制, 将LSTM 输入的外观集中到潜在目标上。 为了防止有攻击性的适应性, 我们应用了封闭性残余模板来控制在跟踪过程中将模板与目标外观变异。 LSTM 使用一个动态式内存网络作为内存控制器的重量参数, 需要花费昂贵的在线微调才能适应。 我们的追踪器会全速前进, 并且通过更新外部记忆的外观来调整目标对象的外观的外观的外观的外观的外观变化。 此外, 我们的内径追踪能力会会提高的内径能能力, 。