通过最优化运输实现的杂腐中流体 (Few-Shot Action Recognition with Compromised Metric via Optimal Transport)

Although vital to computer vision systems, few-shot action recognition is still not mature despite the wide research of few-shot image classification. Popular few-shot learning algorithms extract a transferable embedding from seen classes and reuse it on unseen classes by constructing a metric-based classifier. One main obstacle to applying these algorithms in action recognition is the complex structure of videos. Some existing solutions sample frames from a video and aggregate their embeddings to form a video-level representation, neglecting important temporal relations. Others perform an explicit sequence matching between two videos and define their distance as matching cost, imposing too strong restrictions on sequence ordering. In this paper, we propose Compromised Metric via Optimal Transport (CMOT) to combine the advantages of these two solutions. CMOT simultaneously considers semantic and temporal information in videos under Optimal Transport framework, and is discriminative for both content-sensitive and ordering-sensitive tasks. In detail, given two videos, we sample segments from them and cast the calculation of their distance as an optimal transport problem between two segment sequences. To preserve the inherent temporal ordering information, we additionally amend the ground cost matrix by penalizing it with the positional distance between a pair of segments. Empirical results on benchmark datasets demonstrate the superiority of CMOT.

翻译：尽管对计算机视觉系统至关重要,尽管对微小图像分类进行了广泛研究,但微小的动作识别仍然不够成熟。一般的微小学习算法从可见的类中提取可转移的嵌入,并通过建造一个基于标准的分类器将其再用于隐蔽的类中。应用这些算法在行动识别中的一个主要障碍是视频的复杂结构。从视频和集成的嵌入框中有些现有的解决办法样本框架,形成视频级代表制,忽视重要的时间关系。另一些则对两个视频进行明确的序列匹配,将其距离定义为匹配成本,对顺序排序施加了过于严格的限制。在本文中,我们提议通过优化的运输(CMOT)对这两个解决方案的优势进行配对。CMOT同时考虑在优化运输框架下的视频中的语义性和时间信息,对内容敏感和订购敏感的任务进行歧视。详细说来,我们从两个视频中抽取了两个部分,并将其距离的计算为两个部分之间的最佳运输问题。为了保存固有的时间排序信息,我们提议通过优化的定序(CMOM)对地面成本矩阵进行修正,同时展示其位置的距离数据。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

专知会员服务

39+阅读 · 2020年11月3日

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning