视频分析/多模态学习论文、代码、数据集大列表

【导读】这篇文章介绍了一些包括视频分析,尤其是视频的多模态学习的研究,包括论文、代码、数据集等。内容包括多模态视频分析、视频时刻本地化、视频检索、视频广告、常识推理、视频高亮、物体跟踪、音视频对话系统、动作识别等。


数据集

AVA dataset

提供对视频的标注,帮助理解人类活动

https://research.google.com/ava/index.html


PyVideoResearch

一个视频研究的仓库,包括常用方法,数据集,任务等

https://github.com/gsig/PyVideoResearch


How2 Dataset

多模态语言学习数据集

https://arxiv.org/pdf/1811.00347.pdf

https://github.com/srvk/how2-dataset


视频时刻本地化数据集

https://github.com/metalbubble/moments_models

http://moments.csail.mit.edu/


预训练的视频与图片Pytorch模型

https://github.com/alexandonian/pretorched-x


YouTube8M数据集

https://ai.googleblog.com/2019/06/announcing-youtube-8m-segments-dataset.html


工具

DCASE

可用于场景、视觉分类和检测的一些辅助函数

https://dcase-repo.github.io/dcase_util/index.html


论文

动作识别

Long-Term Feature Banks for Detailed Video Understanding (CVPR2019)

https://arxiv.org/pdf/1812.05038.pdf

https://github.com/facebookresearch/video-long-term-feature-banks


Deep Learning for Video Classification and Captioning

https://arxiv.org/pdf/1609.06782.pdf


Large-scale Video Classification with Convolutional Neural Networks

https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/42455.pdf)


Learning Spatiotemporal Features with 3D Convolutional Networks

http://www.cvfoundation.org/openaccess/content_iccv_2015/papers/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.pdf


Two-Stream Convolutional Networks for Action Recognition in Video

https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf


Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

http://www.cvfoundation.org/openaccess/content_cvpr_2015/papers/Wang_Action_Recognition_With_2015_CVPR_paper.pdf


Non-local neural networks

http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.pdf


Learning Correspondence from the Cycle-consistency of Time

https://arxiv.org/pdf/1903.07593.pdf

https://github.com/xiaolonw/TimeCycle


3D ConvNets in Pytorch

https://github.com/Tushar-N/pytorch-resnet3d


多模态视频分析

Awsome list for multimodal learning

https://github.com/pliang279/multimodal-ml-reading-list


VideoBERT: A Joint Model for Video and Language Representation Learning

https://arxiv.org/abs/1904.01766、


AENet: Learning Deep Audio Features for Video Analysis

https://arxiv.org/pdf/1701.00599.pdf

https://github.com/znaoya/aenet


Look, Listen and Learn

https://arxiv.org/pdf/1705.08168.pdf


Objects that Sound

https://arxiv.org/pdf/1712.06651


Learning to Separate Object Sounds by Watching Unlabeled Video

https://arxiv.org/pdf/1804.01665.pdf                                                                  

Ambient Sound Provides Supervision for Visual Learning

http://www.eccv2016.org/files/posters/O-1B-01.pdf


视频时刻本地化

Localizing Moments in Video with Natural Language

https://arxiv.org/pdf/1708.01641.pdf

https://github.com/LisaAnne/LocalizingMoments


视频检索

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data." 

https://arxiv.org/pdf/1804.02516.pdf

https://github.com/antoine77340/Mixture-of-Embedding-Experts


Cross-Modal and Hierarchical Modeling of Video and Text 

https://arxiv.org/pdf/1810.07212.pdf


A dataset for movie description. 

https://arxiv.org/pdf/1501.02530.pdf


Web-scale Multimedia Search for Internet Video Content. 

http://www.lujiang.info/resources/Thesis.pdf


视频广告

Automatic understanding of image and video advertisements

http://openaccess.thecvf.com/content_cvpr_2017/papers/Hussain_Automatic_Understanding_of_CVPR_2017_paper.pdf

http://people.cs.pitt.edu/~kovashka/ads/


Multimodal Representation of Advertisements Using Segment-level Autoencoders

https://sail.usc.edu/publications/files/p418-somandepalli.pdf

https://github.com/usc-sail/mica-multimodal-ads


Story Understanding in Video Advertisements

http://people.cs.pitt.edu/~kovashka/ye_buettner_kovashka_bmvc2018.pdf

https://github.com/yekeren/Story-Video_ads_understanding


ADVISE: Symbolism and External Knowledge for Decoding Advertisements

http://people.cs.pitt.edu/~kovashka/ye_kovashka_advise_eccv2018.pdf

https://github.com/yekeren/ADVISE


视频常识推理

From Recognition to Cognition: Visual Commonsense Reasoning

https://arxiv.org/pdf/1811.10830.pdf

https://visualcommonsense.com/


视频高亮预测

Video highlight prediction using audience chat reactions


目标跟踪

SenseTime's research platform for single object tracking research, implementing algorithms like SiamRPN and SiamMask

https://github.com/STVIR/pysot


音视频对话

https://github.com/batra-mlp-lab/avsd

-END-

专 · 知

专知,专业可信的人工智能知识分发,让认知协作更快更好!欢迎登录www.zhuanzhi.ai,注册登录专知,获取更多AI知识资料!

欢迎微信扫一扫加入专知人工智能知识星球群,获取最新AI专业干货知识教程视频资料和与专家交流咨询

请加专知小助手微信(扫一扫如下二维码添加),加入专知人工智能主题群,咨询技术商务合作~

专知《深度学习:算法到实战》课程全部完成!550+位同学在学习,现在报名,限时优惠!网易云课堂人工智能畅销榜首位!

点击“阅读原文”,了解报名专知《深度学习:算法到实战》课程

展开全文
Top
微信扫码咨询专知VIP会员