Inspired by the fact that human eyes continue to develop tracking ability in early and middle childhood, we propose to use tracking as a proxy task for a computer vision system to learn the visual representations. Modelled on the Catch game played by the children, we design a Catch-the-Patch (CtP) game for a 3D-CNN model to learn visual representations that would help with video-related tasks. In the proposed pretraining framework, we cut an image patch from a given video and let it scale and move according to a pre-set trajectory. The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame. We discover that using multiple image patches simultaneously brings clear benefits. We further increase the difficulty of the game by randomly making patches invisible. Extensive experiments on mainstream benchmarks demonstrate the superior performance of CtP against other video pretraining methods. In addition, CtP-pretrained features are less sensitive to domain gaps than those trained by a supervised action recognition task. When both trained on Kinetics-400, we are pleasantly surprised to find that CtP-pretrained representation achieves much higher action classification accuracy than its fully supervised counterpart on Something-Something dataset. Code is available online: github.com/microsoft/CtP.


翻译:受3D-CNN模型的启发,人类的眼睛继续在幼儿和中童年阶段发展跟踪能力,我们提议将跟踪作为计算机视觉系统的一个代理任务,以学习视觉显示。在儿童玩的Catch游戏上,我们以儿童玩的Catch游戏为模型设计了一个3D-CNN模型的Catch-Patch(CtP)游戏,以学习有助于视频相关任务的视觉表现。在拟议的培训前框架中,我们从给定视频中剪切除一个图像补丁,让它缩放,并按照预设的轨迹移动。代理的任务是在视频框序列中估计图像补丁的位置和大小,只考虑到第一个框中的目标捆绑框。我们发现,使用多个图像补丁同时带来明显的好处。我们通过随机地将补丁变成隐形,进一步增加了游戏的难度。关于主流基准的广泛实验显示了CtP相对于其他视频预培训方法的优异性表现。此外,CtP受限制的功能对域差距比受监督的行动识别任务要敏感。在Kinitics-400中,我们很惊讶地发现,我们发现使用多个图像补置的C-crofolticregill exexexexexlaction action:我们完全的C-creduction action action action.

0
下载
关闭预览

相关内容

iOS 8 提供的应用间和应用跟系统的功能交互特性。
  • Today (iOS and OS X): widgets for the Today view of Notification Center
  • Share (iOS and OS X): post content to web services or share content with others
  • Actions (iOS and OS X): app extensions to view or manipulate inside another app
  • Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
  • Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
  • Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
  • Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem
【Google】平滑对抗训练,Smooth Adversarial Training
专知会员服务
49+阅读 · 2020年7月4日
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
165+阅读 · 2020年3月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
18+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
disentangled-representation-papers
CreateAMind
26+阅读 · 2018年9月12日
STRCF for Visual Object Tracking
统计学习与视觉计算组
14+阅读 · 2018年5月29日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Arxiv
5+阅读 · 2018年4月30日
VIP会员
相关资讯
强化学习的Unsupervised Meta-Learning
CreateAMind
18+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
disentangled-representation-papers
CreateAMind
26+阅读 · 2018年9月12日
STRCF for Visual Object Tracking
统计学习与视觉计算组
14+阅读 · 2018年5月29日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Top
微信扫码咨询专知VIP会员