【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

仅凭对话就能猜测人类行为吗?在这项工作中，我们调查了电影中的言语和动作之间的联系。我们注意到，电影剧本描述动作，也包含角色的语言，因此可以用来学习这种相关性，而不需要额外的监督。我们在一千多部电影剧本中训练一个基于BERT的语音动作分类器，从转录的语音片段中预测动作标签。然后，我们将该模型应用于一个大型未标记电影语料库的语音片段(来自288K电影的1.88亿个语音片段)。利用该模型的预测，我们得到了800K以上视频片段的弱动作标签。通过对这些视频剪辑的训练，我们在标准动作识别基准上展示了优越的动作识别性能，而无需使用一个手动标记的动作示例。

成为VIP会员查看完整内容

相关内容

CVPR 2020

关注 57

CVPR is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers. CVPR 2020 will take place at The Washington State Convention Center in Seattle, WA, from June 16 to June 20, 2020. http://cvpr2020.thecvf.com/

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

专知会员服务

13+阅读 · 2020年3月12日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日