【导读】这篇文章介绍了一些包括视频分析,尤其是视频的多模态学习的研究,包括论文、代码、数据集等。内容包括多模态视频分析、视频时刻本地化、视频检索、视频广告、常识推理、视频高亮、物体跟踪、音视频对话系统、动作识别等。
数据集
AVA dataset
提供对视频的标注,帮助理解人类活动
https://research.google.com/ava/index.html
PyVideoResearch
一个视频研究的仓库,包括常用方法,数据集,任务等
https://github.com/gsig/PyVideoResearch
How2 Dataset
多模态语言学习数据集
https://arxiv.org/pdf/1811.00347.pdf
https://github.com/srvk/how2-dataset
视频时刻本地化数据集
https://github.com/metalbubble/moments_models
http://moments.csail.mit.edu/
预训练的视频与图片Pytorch模型
https://github.com/alexandonian/pretorched-x
YouTube8M数据集
https://ai.googleblog.com/2019/06/announcing-youtube-8m-segments-dataset.html
工具
DCASE
可用于场景、视觉分类和检测的一些辅助函数
https://dcase-repo.github.io/dcase_util/index.html
论文
动作识别
Long-Term Feature Banks for Detailed Video Understanding (CVPR2019)
https://arxiv.org/pdf/1812.05038.pdf
https://github.com/facebookresearch/video-long-term-feature-banks
Deep Learning for Video Classification and Captioning
https://arxiv.org/pdf/1609.06782.pdf
Large-scale Video Classification with Convolutional Neural Networks
https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/42455.pdf)
Learning Spatiotemporal Features with 3D Convolutional Networks
http://www.cvfoundation.org/openaccess/content_iccv_2015/papers/Tran_Learning_Spatiotemporal_Features_ICCV_2015_paper.pdf
Two-Stream Convolutional Networks for Action Recognition in Video
https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
http://www.cvfoundation.org/openaccess/content_cvpr_2015/papers/Wang_Action_Recognition_With_2015_CVPR_paper.pdf
Non-local neural networks
http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.pdf
Learning Correspondence from the Cycle-consistency of Time
https://arxiv.org/pdf/1903.07593.pdf
https://github.com/xiaolonw/TimeCycle
3D ConvNets in Pytorch
https://github.com/Tushar-N/pytorch-resnet3d
多模态视频分析
Awsome list for multimodal learning
https://github.com/pliang279/multimodal-ml-reading-list
VideoBERT: A Joint Model for Video and Language Representation Learning
https://arxiv.org/abs/1904.01766、
AENet: Learning Deep Audio Features for Video Analysis
https://arxiv.org/pdf/1701.00599.pdf
https://github.com/znaoya/aenet
Look, Listen and Learn
https://arxiv.org/pdf/1705.08168.pdf
Objects that Sound
https://arxiv.org/pdf/1712.06651
Learning to Separate Object Sounds by Watching Unlabeled Video
https://arxiv.org/pdf/1804.01665.pdf
Ambient Sound Provides Supervision for Visual Learning
http://www.eccv2016.org/files/posters/O-1B-01.pdf
视频时刻本地化
Localizing Moments in Video with Natural Language
https://arxiv.org/pdf/1708.01641.pdf
https://github.com/LisaAnne/LocalizingMoments
视频检索
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data."
https://arxiv.org/pdf/1804.02516.pdf
https://github.com/antoine77340/Mixture-of-Embedding-Experts
Cross-Modal and Hierarchical Modeling of Video and Text
https://arxiv.org/pdf/1810.07212.pdf
A dataset for movie description.
https://arxiv.org/pdf/1501.02530.pdf
Web-scale Multimedia Search for Internet Video Content.
http://www.lujiang.info/resources/Thesis.pdf
视频广告
Automatic understanding of image and video advertisements
http://openaccess.thecvf.com/content_cvpr_2017/papers/Hussain_Automatic_Understanding_of_CVPR_2017_paper.pdf
http://people.cs.pitt.edu/~kovashka/ads/
Multimodal Representation of Advertisements Using Segment-level Autoencoders
https://sail.usc.edu/publications/files/p418-somandepalli.pdf
https://github.com/usc-sail/mica-multimodal-ads
Story Understanding in Video Advertisements
http://people.cs.pitt.edu/~kovashka/ye_buettner_kovashka_bmvc2018.pdf
https://github.com/yekeren/Story-Video_ads_understanding
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
http://people.cs.pitt.edu/~kovashka/ye_kovashka_advise_eccv2018.pdf
https://github.com/yekeren/ADVISE
视频常识推理
From Recognition to Cognition: Visual Commonsense Reasoning
https://arxiv.org/pdf/1811.10830.pdf
https://visualcommonsense.com/
视频高亮预测
Video highlight prediction using audience chat reactions
目标跟踪
SenseTime's research platform for single object tracking research, implementing algorithms like SiamRPN and SiamMask
https://github.com/STVIR/pysot
音视频对话
https://github.com/batra-mlp-lab/avsd
-END-
专 · 知
专知,专业可信的人工智能知识分发,让认知协作更快更好!欢迎登录www.zhuanzhi.ai,注册登录专知,获取更多AI知识资料!
欢迎微信扫一扫加入专知人工智能知识星球群,获取最新AI专业干货知识教程视频资料和与专家交流咨询!
请加专知小助手微信(扫一扫如下二维码添加),加入专知人工智能主题群,咨询技术商务合作~
专知《深度学习:算法到实战》课程全部完成!550+位同学在学习,现在报名,限时优惠!网易云课堂人工智能畅销榜首位!
点击“阅读原文”,了解报名专知《深度学习:算法到实战》课程