用于第一人视频视频域适应的频道-临时关注 (Channel-Temporal Attention for First-Person Video Domain Adaptation)

Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADL$_{small}$ and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPIC$_{cvpr20}$.

翻译：未受监督的域适应(UDA)可以将知识从标签源数据转移到同一类别无标签的目标数据。然而,用于第一人行动识别的UDA是一个未得到充分探讨的问题,缺乏数据集,对第一人视频特征的考虑有限。本文侧重于解决这一问题。首先,我们提议两个小规模第一人视频域适应数据集:ADL$ ⁇ small}$和GTEA-KITCHEN。第二,我们引入了频道时钟关注区块,以捕捉对第一人愿景十分重要的频道和时间关系并模拟其相互依存关系。最后,我们提议建立一个频道时钟注意网络(CTAN),将这些区块纳入现有结构。CTAN优于两个拟议数据集的基线和现有的1个数据数据集EPIC$ ⁇ cvpr20}。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日