动态视觉识别光谱和焦点网络 (Glance and Focus Networks for Dynamic Visual Recognition)

Spatial redundancy widely exists in visual recognition tasks, i.e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand. Therefore, static models which process all the pixels with an equal amount of computation result in considerable redundancy in terms of time and space consumption. In this paper, we formulate the image recognition problem as a sequential coarse-to-fine feature learning process, mimicking the human visual system. Specifically, the proposed Glance and Focus Network (GFNet) first extracts a quick global representation of the input image at a low resolution scale, and then strategically attends to a series of salient (small) regions to learn finer features. The sequential process naturally facilitates adaptive inference at test time, as it can be terminated once the model is sufficiently confident about its prediction, avoiding further redundant computation. It is worth noting that the problem of locating discriminant regions in our model is formulated as a reinforcement learning task, thus requiring no additional manual annotations other than classification labels. GFNet is general and flexible as it is compatible with any off-the-shelf backbone models (such as MobileNets, EfficientNets and TSM), which can be conveniently deployed as the feature extractor. Extensive experiments on a variety of image classification and video recognition tasks and with various backbone models demonstrate the remarkable efficiency of our method. For example, it reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 1.3x without sacrificing accuracy. Code and pre-trained models are available at https://github.com/blackfeather-wang/GFNet-Pytorch.

翻译：视觉识别任务中广泛存在空间冗余,即图像或视频框架中的歧视性特征通常只相当于像素子子集,而其余区域则与手头的任务无关。因此,以同等数量计算处理所有像素的静态模型在时间和空间消耗方面造成相当的冗余。在本文件中,我们将图像识别问题设计成一个连续粗向软体特征学习过程,模仿人类视觉系统。具体地说,拟议的Glance和焦点网络(GFNet)首先以低分辨率标度提取输入图像的快速全球表示,然后从战略角度关注一系列精度(小)区域学习精度特征。因此,顺序过程自然有利于在测试时间适应所有像素的变异性,因为一旦模型对其预测足够自信,就可以终止这种图像识别问题,避免进一步的冗余计算。值得注意的是,在我们的模型中定位相干区域是一个强化学习任务,因此除了分类标签之外不需要额外的手动说明。 GFNet(GFNet)是通用的和灵活的,因为它与Sloi-Net的精度模型不易操作,可以用来在任何移动-roidal-halalal lial libal libal listral listral liforal lixal list list list listal lical lical ligalation ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex exm exmalgistration exmmex ex ex ex ex ex exm exm ex ex exmexmexmexmexm ex ex exm exm exmexmmal exm exm exm exm exm exm exm exp exmmmmmmmmmmal exmal ex exmmal exmmmmmmmmmal exmal ex ex ex ex exmal ex ex ex exmal ex ex ex ex ex ex exal ex

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

专知会员服务

39+阅读 · 2020年11月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》