STRCF for Visual Object Tracking

2018 年 5 月 29 日 统计学习与视觉计算组
STRCF for Visual Object Tracking

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking


原文:https://arxiv.org/abs/1803.08679


CVPR2018


一、概述及贡献


  • 通过加入temporal regularizer(时间正则化),将基于多张训练样本进行的SRDCF改进为基于单张训练样本的STRCF。改进了原SRDCF不实时、效率低的问题。比原SRDCF方法性能也有了提升。

  • 作为online PA学习方法的一种扩展,STRCF在新的实例来时,更新滤波器不敏感;对新的实例正确分类敏感。使得STRCF的表观模型在目标表观发生剧烈变化时更加鲁棒。

  • 利用ADMM算法解决优化问题。迭代次数少速度快、有闭式解。

  • 使用深度特征的STRCF方法效果能与state-of-the-art方法抗衡。


二、DCF -- SRDCF


(1)、判别式相关滤波跟踪方法因为边界效应问题,效果受到限制。


  • 所谓边界效应,是指在相关滤波方法中样本循环移位后(图b),这样左右连接上下连接的图像边界进行周期性的傅里叶变换后会产生一个噪声。



  • 对这样的样本加余弦窗处理(图像乘上一个余弦窗口将是靠近图像边缘的像素值接近于零。这同样有利于突出靠近中心的目标。)可以减轻噪声影响。但加入余弦窗会产生几个问题


  • 增加计算量

  • 影响滤波器对背景的学习

  • 当目标即将移出框外时,余弦窗抵消了一部分目标有效像素。当目标部       分移出框外时,余弦窗使得能学习的有效像素更少,影响滤波器学习。


(2)、通过强制空间惩罚,SRDCF减轻了边界效应的影响,提高了DCF的性                 能,但计算量更大速度下降,判别式目标跟踪的优势无法体现。

  • 式中w即空间正则化矩阵,f为滤波器,是每帧目标特征的权重系数,离当前帧越近,系数越大。SRDCF速度慢有两点因素,破坏了循环矩阵结构和选择的优化算法(Gauss-Seidel)不够高效。


三、STRCF




  • 受PA算法启发,加入了时间正则化项,实验证明μ为16效果最好

  • occlusion时,SRDCF会因为最近几帧corrupted samples权重系数过大而严重过拟合,而STRCF因为时间正则化项,对模型变化不敏感,缓和了这个问题



ADMM优化方法


假设优化问题:

min f(x)+g(z)  s.t.  Ax+Bz=c

它的增广拉格朗日形式为:

迭代方式为:

对应到STRCF上


STRCF损失函数的增广拉格朗日形式为:

(  s.t.  f = g   )


有:


迭代求解:


三个子问题都有闭式解,且作者在原文中说明,由经验知迭代次数一般为2次。


  • 子问题f:先将原式转换到傅里叶域,因为某j标签只与所有D个通道上j-th特征和滤波器相关,所以按通道划分滤波器和特征。表示成

        对上式求导为零,得到解:

        其中:

        因为秩为1,利用Sherman-Morrion公式得到解:

  • 子问题g:有闭式解

  • 惩罚因子γ的更新:


四、实验


(1)、OTB2015

   深度特征使用VGG-M的cov3


(2)、hand-crafted特征和深度特征方法比较


(3)、视频分属性实验


(4)、 VOT2016


登录查看更多
13

相关内容

The greatest challenge facing visual object tracking is the simultaneous requirements on robustness and discrimination power. In this paper, we propose a SiamFC-based tracker, named SPM-Tracker, to tackle this challenge. The basic idea is to address the two requirements in two separate matching stages. Robustness is strengthened in the coarse matching (CM) stage through generalized training while discrimination power is enhanced in the fine matching (FM) stage through a distance learning network. The two stages are connected in series as the input proposals of the FM stage are generated by the CM stage. They are also connected in parallel as the matching scores and box location refinements are fused to generate the final results. This innovative series-parallel structure takes advantage of both stages and results in superior performance. The proposed SPM-Tracker, running at 120fps on GPU, achieves an AUC of 0.687 on OTB-100 and an EAO of 0.434 on VOT-16, exceeding other real-time trackers by a notable margin.

0
3
下载
预览

Latest deep learning methods for object detection provide remarkable performance, but have limits when used in robotic applications. One of the most relevant issues is the long training time, which is due to the large size and imbalance of the associated training sets, characterized by few positive and a large number of negative examples (i.e. background). Proposed approaches are based on end-to-end learning by back-propagation [22] or kernel methods trained with Hard Negatives Mining on top of deep features [8]. These solutions are effective, but prohibitively slow for on-line applications. In this paper we propose a novel pipeline for object detection that overcomes this problem and provides comparable performance, with a 60x training speedup. Our pipeline combines (i) the Region Proposal Network and the deep feature extractor from [22] to efficiently select candidate RoIs and encode them into powerful representations, with (ii) the FALKON [23] algorithm, a novel kernel-based method that allows fast training on large scale problems (millions of points). We address the size and imbalance of training data by exploiting the stochastic subsampling intrinsic into the method and a novel, fast, bootstrapping approach. We assess the effectiveness of the approach on a standard Computer Vision dataset (PASCAL VOC 2007 [5]) and demonstrate its applicability to a real robotic scenario with the iCubWorld Transformations [18] dataset.

0
6
下载
预览

Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.

0
10
下载
预览

During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in these family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.

0
9
下载
预览

Discrete correlation filter (DCF) based trackers have shown considerable success in visual object tracking. These trackers often make use of low to mid level features such as histogram of gradients (HoG) and mid-layer activations from convolution neural networks (CNNs). We argue that including semantically higher level information to the tracked features may provide further robustness to challenging cases such as viewpoint changes. Deep salient object detection is one example of such high level features, as it make use of semantic information to highlight the important regions in the given scene. In this work, we propose an improvement over DCF based trackers by combining saliency based and other features based filter responses. This combination is performed with an adaptive weight on the saliency based filter responses, which is automatically selected according to the temporal consistency of visual saliency. We show that our method consistently improves a baseline DCF based tracker especially in challenging cases and performs superior to the state-of-the-art. Our improved tracker operates at 9.3 fps, introducing a small computational burden over the baseline which operates at 11 fps.

0
6
下载
预览

Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame. However, it is harder to track an object in some frames than others, due to the varying amount of clutter, scene complexity, amount of motion, and object's distinctiveness against its background. We propose a depth-adaptive convolutional Siamese network that performs video tracking adaptively at multiple neural network depths. Parametric gating functions are trained to control the depth of the convolutional feature extractor by minimizing a joint loss of computational cost and tracking error. Our network achieves accuracy comparable to the state-of-the-art on the VOT2016 benchmark. Furthermore, our adaptive depth computation achieves higher accuracy for a given computational cost than traditional fixed-structure neural networks. The presented framework extends to other tasks that use convolutional neural networks and enables trading speed for accuracy at runtime.

0
8
下载
预览
小贴士
相关资讯
CVPR2019 | SiamMask:视频跟踪最高精度
计算机视觉life
5+阅读 · 2019年3月10日
Fully-Convolutional Siamese Networks for Object Tracking论文笔记
统计学习与视觉计算组
8+阅读 · 2018年10月12日
《pyramid Attention Network for Semantic Segmentation》
统计学习与视觉计算组
42+阅读 · 2018年8月30日
论文笔记之Meta-Tracker(ECCV2018)
统计学习与视觉计算组
14+阅读 · 2018年8月24日
论文笔记之attention mechanism专题1:SA-Net(CVPR 2018)
统计学习与视觉计算组
15+阅读 · 2018年4月5日
Focal Loss for Dense Object Detection
统计学习与视觉计算组
11+阅读 · 2018年3月15日
BranchOut: Regularization for Online Ensemble Tracking with CNN
统计学习与视觉计算组
9+阅读 · 2017年10月7日
论文笔记:多任务相关粒子滤波跟踪器
统计学习与视觉计算组
8+阅读 · 2017年7月7日
论文笔记:通道空间可靠性相关滤波跟踪器
统计学习与视觉计算组
4+阅读 · 2017年6月24日
上下文感知相关滤波跟踪
统计学习与视觉计算组
5+阅读 · 2017年6月17日
相关VIP内容
专知会员服务
26+阅读 · 2020年1月10日
相关论文
Learning Discriminative Model Prediction for Tracking
Goutam Bhat,Martin Danelljan,Luc Van Gool,Radu Timofte
6+阅读 · 2019年4月15日
SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking
Guangting Wang,Chong Luo,Zhiwei Xiong,Wenjun Zeng
3+阅读 · 2019年4月9日
Speeding-up Object Detection Training for Robotics with FALKON
Elisa Maiettini,Giulia Pasquale,Lorenzo Rosasco,Lorenzo Natale
6+阅读 · 2018年8月27日
Mathieu Garon,Denis Laurendeau,Jean-François Lalonde
6+阅读 · 2018年3月28日
Chao Ma,Jia-Bin Huang,Xiaokang Yang,Ming-Hsuan Yang
10+阅读 · 2018年3月23日
Erhan Gundogdu,A. Aydin Alatan
9+阅读 · 2018年3月10日
Mustansar Fiaz,Arif Mahmood,Soon Ki Jung
9+阅读 · 2018年2月14日
Caglar Aytekin,Francesco Cricri,Emre Aksu
6+阅读 · 2018年2月8日
Li Wang,Ting Liu,Bing Wang,Xulei Yang,Gang Wang
13+阅读 · 2018年1月6日
Chris Ying,Katerina Fragkiadaki
8+阅读 · 2018年1月1日
Top
微信扫码咨询专知VIP会员