学习过程感知的视频表示：从说明视频及其叙述中学习 (Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations) - 专知论文

会员服务 ·

0

视频表示 · 视频 · 表示 · 时间依赖 · 联合学习 ·

2023 年 3 月 31 日

Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

翻译：学习过程感知的视频表示：从说明视频及其叙述中学习

Yiwu Zhong,Licheng Yu,Yang Bai,Shangwen Li,Xueting Yan,Yin Li

from arxiv, Accepted to CVPR 2023

The abundance of instructional videos and their narrations over the Internet offers an exciting avenue for understanding procedural activities. In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations. Our method jointly learns a video representation to encode individual step concepts, and a deep probabilistic model to capture both temporal dependencies and immense individual variations in the step ordering. We empirically demonstrate that learning temporal ordering not only enables new capabilities for procedure reasoning, but also reinforces the recognition of individual steps. Our model significantly advances the state-of-the-art results on step classification (+2.8% / +3.3% on COIN / EPIC-Kitchens) and step forecasting (+7.4% on COIN). Moreover, our model attains promising results in zero-shot inference for step classification and forecasting, as well as in predicting diverse and plausible steps for incomplete procedures. Our code is available at https://github.com/facebookresearch/ProcedureVRL.

翻译：互联网上大量的说明视频及其叙述为理解过程活动提供了激动人心的途径。本研究提出了一种方法，在不使用人工标注的情况下，基于大规模的网页说明视频及其叙述数据集，学习视频表示，以编码各个动作步骤及其时间顺序。我们的方法联合学习视频表示，以编码各个步骤概念和深度概率模型，以捕获步骤顺序中的时间依赖性和巨大个体变异性。我们在实证上证明，学习时间顺序不仅为过程推理提供了新的能力，而且加强了对个别步骤的识别。我们的模型显著提高了步骤分类（COIN/EPIC-Kitchens上分别增加了2.8%/3.3%）和步骤预测（COIN上增加了7.4%）的最新结果。此外，我们的模型在零样例推理步骤分类和预测，及在预测不完整过程的各种可行步骤方面也有着良好的表现。我们的代码可在https://github.com/facebookresearch/ProcedureVRL 上获得。

0

相关内容

视频表示

不可错过！CMU《机器学习导论》2023课程，Matt Gormley带队讲授，附Slides

不可错过！CMU《机器学习导论》2023课程，Matt Gormley带队讲授，附Slides

专知会员服务

38+阅读 · 2023年2月7日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

专知会员服务

90+阅读 · 2020年12月21日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

情绪对动作控制影响的神经机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向网络编码的编码理论

国家自然科学基金

0+阅读 · 2014年12月31日

异种铝合金激光焊热裂纹敏感性及力学行为多尺度建模

国家自然科学基金

0+阅读 · 2012年12月31日

间歇性爆发性障碍患者愤怒情绪调控的DNA甲基化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白质精氨酸甲基转移酶PRMT7调控肝脏发育及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于hLDA层次主题模型的中文多文档摘要研究

国家自然科学基金

1+阅读 · 2012年12月31日

新癌基因E3连接酶HECTD3表达调节机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

蛋白精氨酸甲基转移酶5在儿童急性淋巴细胞白血病细胞表达改变在白血病发病作用的初探以及对个体化治疗的指

国家自然科学基金

0+阅读 · 2011年12月31日

基于面部视频的疲劳状态分析与理解

国家自然科学基金

0+阅读 · 2009年12月31日

LRP16对PPARgamma的调控作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月21日

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

Arxiv

0+阅读 · 2023年5月21日

Hierarchical Compositional Representations for Few-shot Action Recognition

Arxiv

0+阅读 · 2023年5月19日

Data Redaction from Conditional Generative Models

Arxiv

0+阅读 · 2023年5月18日

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors

Arxiv

0+阅读 · 2023年5月18日

Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs

Arxiv

0+阅读 · 2023年5月18日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models

Arxiv

17+阅读 · 2021年3月23日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！CMU《机器学习导论》2023课程，Matt Gormley带队讲授，附Slides

不可错过！CMU《机器学习导论》2023课程，Matt Gormley带队讲授，附Slides

专知会员服务

38+阅读 · 2023年2月7日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

不可错过！MILA最新《自监督表示学习》课程，附PPT与视频下载

专知会员服务

90+阅读 · 2020年12月21日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月21日

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

Arxiv

0+阅读 · 2023年5月21日

Hierarchical Compositional Representations for Few-shot Action Recognition

Arxiv

0+阅读 · 2023年5月19日

Data Redaction from Conditional Generative Models

Arxiv

0+阅读 · 2023年5月18日

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors

Arxiv

0+阅读 · 2023年5月18日

Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs

Arxiv

0+阅读 · 2023年5月18日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models

Arxiv

17+阅读 · 2021年3月23日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

相关基金

情绪对动作控制影响的神经机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向网络编码的编码理论

国家自然科学基金

0+阅读 · 2014年12月31日

异种铝合金激光焊热裂纹敏感性及力学行为多尺度建模

国家自然科学基金

0+阅读 · 2012年12月31日

间歇性爆发性障碍患者愤怒情绪调控的DNA甲基化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白质精氨酸甲基转移酶PRMT7调控肝脏发育及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于hLDA层次主题模型的中文多文档摘要研究

国家自然科学基金

1+阅读 · 2012年12月31日

新癌基因E3连接酶HECTD3表达调节机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

蛋白精氨酸甲基转移酶5在儿童急性淋巴细胞白血病细胞表达改变在白血病发病作用的初探以及对个体化治疗的指

国家自然科学基金

0+阅读 · 2011年12月31日

基于面部视频的疲劳状态分析与理解

国家自然科学基金

0+阅读 · 2009年12月31日

LRP16对PPARgamma的调控作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员