CLIP4 标题 ++: 视频字幕多 CLIP (CLIP4Caption ++: Multi-CLIP for Video Caption) - 专知论文

会员服务 ·

0

视频描述生成（Video Caption） · MoDELS · Guidance · TVC · INFORMS ·

2021 年 10 月 11 日

CLIP4Caption ++: Multi-CLIP for Video Caption

翻译：CLIP4 标题 ++: 视频字幕多 CLIP

Mingkang Tang,Zhanyu Wang,Zhaoyang Zeng,Fengyun Rao,Dian Li

from arxiv, 4 pages, VALUE Challenge 2021 captioning task chamionship solution

This report describes our solution to the VALUE Challenge 2021 in the captioning task. Our solution, named CLIP4Caption++, is built on X-Linear/X-Transformer, which is an advanced model with encoder-decoder architecture. We make the following improvements on the proposed CLIP4Caption++: We employ an advanced encoder-decoder model architecture X-Transformer as our main framework and make the following improvements: 1) we utilize three strong pre-trained CLIP models to extract the text-related appearance visual features. 2) we adopt the TSN sampling strategy for data enhancement. 3) we involve the video subtitle information to provide richer semantic information. 3) we introduce the subtitle information, which fuses with the visual features as guidance. 4) we design word-level and sentence-level ensemble strategies. Our proposed method achieves 86.5, 148.4, 64.5 CIDEr scores on VATEX, YC2C, and TVC datasets, respectively, which shows the superior performance of our proposed CLIP4Caption++ on all three datasets.

翻译：本报告描述了我们在说明任务中应对2021年增值挑战的解决方案。我们的解决方案名为CLIP4Caption++,以X-Linear/X-Transexter为基础,这是一个带有编码器-编码器结构的先进模型。我们对拟议的CLIP4Caption++:我们使用先进的编码器-编码器模型架构X-转换软件作为我们的主要框架,并作出以下改进:(1)我们使用三个经过预先训练的CLIP模型来提取与文本有关的外观特征。(2)我们采用TRN抽样战略来增强数据。(3)我们使用视频字幕信息来提供更丰富的语义信息。(3)我们采用字幕信息,将字幕与视觉特征相结合,作为指导。(4)我们设计了字级和句级共用战略。我们的拟议方法分别达到86.5、148.4、64.5、64.5 VATEX、YC2C和TVC数据集的CIDER评分数,这显示了我们提议的CIP4C++所有三个数据集的优性表现。

0

相关内容

视频描述生成（Video Caption）

视频描述生成（Video Caption）

视频描述生成（Video Caption），就是从视频中自动生成一段描述性文字

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

自动化学科面临的挑战

自动化学科面临的挑战

专知会员服务

38+阅读 · 2020年12月19日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【仿真】虚拟调试 Virtual commissioning

【仿真】虚拟调试 Virtual commissioning

产业智能官

7+阅读 · 2019年5月1日

已删除

将门创投

7+阅读 · 2018年12月12日

polyglot：Pipeline 多语言NLP工具

polyglot：Pipeline 多语言NLP工具

AINLP

4+阅读 · 2018年12月11日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

机器人开发库软件大列表

机器人开发库软件大列表

专知

10+阅读 · 2018年3月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年1月29日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

VIP会员

文章信息

相关主题

视频描述生成（Video Caption）

相关VIP内容

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

自动化学科面临的挑战

自动化学科面临的挑战

专知会员服务

38+阅读 · 2020年12月19日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知会员服务

39+阅读 · 2020年3月5日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【仿真】虚拟调试 Virtual commissioning

【仿真】虚拟调试 Virtual commissioning

产业智能官

7+阅读 · 2019年5月1日

已删除

将门创投

7+阅读 · 2018年12月12日

polyglot：Pipeline 多语言NLP工具

polyglot：Pipeline 多语言NLP工具

AINLP

4+阅读 · 2018年12月11日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

机器人开发库软件大列表

机器人开发库软件大列表

专知

10+阅读 · 2018年3月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年1月29日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

Integrating both Visual and Audio Cues for Enhanced Video Caption

Arxiv

4+阅读 · 2017年12月9日

微信扫码咨询专知VIP会员