在多变量空间搜索双相位模型以识别视频 (Searching for Two-Stream Models in Multivariate Space for Video Recognition)

Conventional video models rely on a single stream to capture the complex spatial-temporal features. Recent work on two-stream video models, such as SlowFast network and AssembleNet, prescribe separate streams to learn complementary features, and achieve stronger performance. However, manually designing both streams as well as the in-between fusion blocks is a daunting task, requiring to explore a tremendously large design space. Such manual exploration is time-consuming and often ends up with sub-optimal architectures when computational resources are limited and the exploration is insufficient. In this work, we present a pragmatic neural architecture search approach, which is able to search for two-stream video models in giant spaces efficiently. We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models. Furthermore, we propose a progressive search procedure, by searching for the architecture of individual streams, fusion blocks, and attention blocks one after the other. We demonstrate two-stream models with significantly better performance can be automatically discovered in our design space. Our searched two-stream models, namely Auto-TSNet, consistently outperform other models on standard benchmarks. On Kinetics, compared with the SlowFast model, our Auto-TSNet-L model reduces FLOPS by nearly 11 times while achieving the same accuracy 78.9%. On Something-Something-V2, Auto-TSNet-M improves the accuracy by at least 2% over other methods which use less than 50 GFLOPS per video.

翻译：常规视频模型依靠单一流来捕捉复杂的空间时空特征。最近关于双流视频模型的工作, 如慢速网络和 AssembleNet, 规定了不同的流以学习互补功能, 并实现更强的性能。然而, 手动设计两个流以及融合区块之间, 是一项艰巨的任务, 需要探索巨大的设计空间。这种手工探索耗时, 并往往在计算资源有限、探索不足时, 最终形成亚最佳结构。在这项工作中, 我们提出了一个实用的神经结构搜索方法, 能够有效地在巨大的空间中搜索双流视频模型。我们设计了一个多变量搜索空间, 包括6个搜索变量, 在设计双流模型时捕捉多种选择。此外, 我们提出一个渐进的搜索程序, 搜索单个流结构、融合区块和关注区块之后一个块。我们在设计空间中可以自动发现两个流模型, 即自动- TSNet 模型-, 持续地超过50 级视频模型。我们设计的双流模型, 持续超过其他模型几乎超过 78 标准 GL 标准标准。在上, AS 上, 上, 也降低使用相同的。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/