Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame. However, it is harder to track an object in some frames than others, due to the varying amount of clutter, scene complexity, amount of motion, and object's distinctiveness against its background. We propose a depth-adaptive convolutional Siamese network that performs video tracking adaptively at multiple neural network depths. Parametric gating functions are trained to control the depth of the convolutional feature extractor by minimizing a joint loss of computational cost and tracking error. Our network achieves accuracy comparable to the state-of-the-art on the VOT2016 benchmark. Furthermore, our adaptive depth computation achieves higher accuracy for a given computational cost than traditional fixed-structure neural networks. The presented framework extends to other tasks that use convolutional neural networks and enables trading speed for accuracy at runtime.
翻译:视频天体跟踪的当前进化神经网络算法对每个天体和视频框架花费同样的计算量。 但是,由于星盘、场景复杂度、运动量和对象在背景上的独特性的不同程度,在某些框架中跟踪物体比其他框架要困难得多。 我们提议了一个深度适应性共变的暹罗网络,在多个神经网络深度上进行视频跟踪,对参数定位功能进行了培训,以控制进化地物提取器的深度,尽量减少共同计算成本损失和跟踪错误。我们的网络的准确性与VOT2016基准的先进水平相当。此外,我们的适应性深度计算在特定计算成本方面比传统的固定结构神经网络更准确。 所提出的框架延伸到了使用进化神经网络的其他任务,并使得在运行时能够以交易速度实现准确性。