更仔细地看一看《以视频为根据的时间判决:数据集和计量》 (A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics) - 专知论文

会员服务 ·

0

矩 · state-of-the-art · 数据集 · Extensibility · 可约的 ·

2021 年 1 月 22 日

A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics

翻译：更仔细地看一看《以视频为根据的时间判决:数据集和计量》

Yitian Yuan,Xiaohan Lan,Long Chen,Wei Liu,Wenwu Zhu

Despite Temporal Sentence Grounding in Videos (TSGV) has realized impressive progress over the last few years, current TSGV models tend to capture the moment annotation biases and fail to take full advantage of multi-modal inputs. Miraculously, some extremely simple TSGV baselines even without training can also achieve state-of-the-art performance. In this paper, we first take a closer look at the existing evaluation protocol, and argue that both the prevailing datasets and metrics are the devils to cause the unreliable benchmarking. To this end, we propose to re-organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions), and deliberately \textbf{C}hange the moment annotation \textbf{D}istribution of the test split to make it different from the training split, dubbed as Charades-CD and ActivityNet-CD, respectively. Meanwhile, we further introduce a new evaluation metric "dR@$n$,IoU@$m$" to calibrate the basic IoU scores by penalizing more on the over-long moment predictions and reduce the inflating performance caused by the moment annotation biases. Under this new evaluation protocol, we conduct extensive experiments and ablation studies on eight state-of-the-art TSGV models. All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV, which is still far from satisfactory. The repository of this work is at \url{https://github.com/yytzsy/grounding_changing_distribution}.

翻译：尽管TSGV(TSGV)在过去几年里取得了令人印象深刻的进展,但目前的TSGV模型往往能够捕捉到当时的批注偏差,无法充分利用多模式投入。光彩地看,一些即使没有培训的极简单 TSGV 基线也可以达到最先进的性能。在本文中,我们首先更仔细地审视现有的评估协议,并争论现有的数据集和衡量标准都是导致基准设定不可靠的弊端。为此,我们提议重新组织两个广泛使用的 TSGV 数据集( Charaddes-STA 和 ActionNet Captions ), 并且有意地去组织两个广泛使用的 TSG 批注的批注偏差, 并且没有充分利用多模式。测试分解的瞬间点使得它与培训分解不同, 分别被形容为Charades- CD 和活动Net- CD CD CD 。同时, 我们进一步引入一个新的评估标准“ dR@$@ comself$, IOU@m$$" 。我们提议重新组织两个广泛使用的 TSOBLOFlevalalal adalevalation adview adview adviewdal deview admental deal deal deal deal demomental demomental disal deal disal dromomental dromomental droisal maisal roisal roismomental 。

0

相关内容

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

专知会员服务

24+阅读 · 2021年1月13日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

275+阅读 · 2019年10月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

已删除

将门创投

3+阅读 · 2019年4月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Arxiv

0+阅读 · 2021年3月18日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

8+阅读 · 2020年12月4日

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Arxiv

3+阅读 · 2020年3月17日

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Arxiv

3+阅读 · 2020年3月12日

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Arxiv

3+阅读 · 2019年7月11日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Video Object Detection with an Aligned Spatial-Temporal Memory

Video Object Detection with an Aligned Spatial-Temporal Memory

Arxiv

4+阅读 · 2018年7月27日

Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning

Arxiv

4+阅读 · 2018年5月22日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

专知会员服务

24+阅读 · 2021年1月13日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

275+阅读 · 2019年10月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

已删除

将门创投

3+阅读 · 2019年4月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Decoupled Spatial Temporal Graphs for Generic Visual Grounding

Arxiv

0+阅读 · 2021年3月18日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

8+阅读 · 2020年12月4日

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Arxiv

3+阅读 · 2020年3月17日

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Arxiv

3+阅读 · 2020年3月12日

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Arxiv

3+阅读 · 2019年7月11日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Video Object Detection with an Aligned Spatial-Temporal Memory

Video Object Detection with an Aligned Spatial-Temporal Memory

Arxiv

4+阅读 · 2018年7月27日

Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning

Arxiv

4+阅读 · 2018年5月22日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

微信扫码咨询专知VIP会员