自动编码带遮罩的人脸视频表示学习（MARLIN） (MARLIN: Masked Autoencoder for facial video Representation LearnINg) - 专知论文

会员服务 ·

0

掩码 · 自编码器 · Learning · 无监督 · 表示学习 ·

2023 年 3 月 22 日

MARLIN: Masked Autoencoder for facial video Representation LearnINg

翻译：自动编码带遮罩的人脸视频表示学习（MARLIN）

Zhixi Cai,Shreya Ghosh,Kalin Stefanov,Abhinav Dhall,Jianfei Cai,Hamid Rezatofighi,Reza Haffari,Munawar Hayat

from arxiv, CVPR 2023

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime. Our code and models are available at https://github.com/ControlNet/MARLIN .

翻译：本文提出了一种自我监督的方法来学习视频中通用的人脸表示，能够在各种人脸分析任务之间进行转移，例如面部属性识别（FAR），面部表情识别（FER），DeepFake检测（DFD）和嘴唇同步（LS）。我们提出的框架名为MARLIN，是一种人脸视频遮罩自动编码器，可以从大量可用的非注释网络爬取人脸视频中学习高度鲁棒和通用的人脸嵌入。作为一项具有挑战性的辅助任务，MARLIN可以从密集掩蔽的人脸区域（主要包括眼睛、鼻子、嘴巴、唇部和皮肤）中重构面部的时空细节，以捕捉局部和全局方面的信息，从而有助于编码通用和可转移的特征。通过各种不同的下游任务的实验，我们展示了MARLIN作为优秀的人脸视频编码器和特征提取器，能够在各种下游任务中保持相对一致的表现，包括FAR（相对于监督基准提高了1.13%），FER（相对于无监督基准提高了2.64%），DFD（相对于无监督基准提高了1.86%），LS（Frechet Inception Distance提高了29.36%），甚至在低数据情况下也能达到很好的效果。我们的代码和模型可在 https://github.com/ControlNet/MARLIN 中获得。

0

相关内容

用于识别任务的视觉 Transformer 综述

用于识别任务的视觉 Transformer 综述

专知会员服务

74+阅读 · 2023年2月25日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

专知会员服务

20+阅读 · 2020年1月26日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

给我1张图，生成30秒视频！｜DeepMind新作

给我1张图，生成30秒视频！｜DeepMind新作

新智元

0+阅读 · 2022年8月19日

KDD 2019论文解读:异构信息网络上的对抗生成学习

KDD 2019论文解读:异构信息网络上的对抗生成学习

云栖社区

23+阅读 · 2019年8月21日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

人脸相关文献代码集锦：人脸检测、人脸识别、人脸生成等

人脸相关文献代码集锦：人脸检测、人脸识别、人脸生成等

专知

15+阅读 · 2019年5月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

基于图像配准与表示联合优化的自动人脸识别研究

国家自然科学基金

0+阅读 · 2014年12月31日

棉花GhCAD6基因在棉花纤维发育中的功能及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

三维视频显著对象编码及可靠传输

国家自然科学基金

0+阅读 · 2013年12月31日

自然场景下机器人大范围视觉伺服研究

国家自然科学基金

1+阅读 · 2012年12月31日

非受约束环境下双目视频监控人脸序列组配准模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于三维视频的人脸表情识别研究

国家自然科学基金

0+阅读 · 2011年12月31日

视频监控中活动人物的视觉理解

国家自然科学基金

1+阅读 · 2009年12月31日

基于2D视频视觉关注度的3D重建方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

异构的P2P覆盖网环境中容错的视频编码及传输

国家自然科学基金

0+阅读 · 2009年12月31日

BlendFields: Few-Shot Example-Driven Facial Modeling

Arxiv

0+阅读 · 2023年5月12日

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Arxiv

1+阅读 · 2023年5月10日

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Arxiv

10+阅读 · 2022年7月30日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

VIP会员

文章信息

相关主题

相关VIP内容

用于识别任务的视觉 Transformer 综述

用于识别任务的视觉 Transformer 综述

专知会员服务

74+阅读 · 2023年2月25日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

基于破坏和构造学习的细粒度图像识别（Destruction and Construction Learning for Fine-grained Image Recognition）

专知会员服务

20+阅读 · 2020年1月26日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

不确定环境下无人机三维路径规划研究 | 221页

远征作战军事后勤规划

大语言模型将如何改变军事指挥结构

美陆军能力集成与开发系统（ACIDS）流程指南 | 2025最新122页

相关资讯

给我1张图，生成30秒视频！｜DeepMind新作

给我1张图，生成30秒视频！｜DeepMind新作

新智元

0+阅读 · 2022年8月19日

KDD 2019论文解读:异构信息网络上的对抗生成学习

KDD 2019论文解读:异构信息网络上的对抗生成学习

云栖社区

23+阅读 · 2019年8月21日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

人脸相关文献代码集锦：人脸检测、人脸识别、人脸生成等

人脸相关文献代码集锦：人脸检测、人脸识别、人脸生成等

专知

15+阅读 · 2019年5月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

BlendFields: Few-Shot Example-Driven Facial Modeling

Arxiv

0+阅读 · 2023年5月12日

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Arxiv

1+阅读 · 2023年5月10日

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Arxiv

10+阅读 · 2022年7月30日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

相关基金

基于深度学习的高分辨率PolSAR影像暗目标判别

国家自然科学基金

3+阅读 · 2015年12月31日

基于图像配准与表示联合优化的自动人脸识别研究

国家自然科学基金

0+阅读 · 2014年12月31日

棉花GhCAD6基因在棉花纤维发育中的功能及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

三维视频显著对象编码及可靠传输

国家自然科学基金

0+阅读 · 2013年12月31日

自然场景下机器人大范围视觉伺服研究

国家自然科学基金

1+阅读 · 2012年12月31日

非受约束环境下双目视频监控人脸序列组配准模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于三维视频的人脸表情识别研究

国家自然科学基金

0+阅读 · 2011年12月31日

视频监控中活动人物的视觉理解

国家自然科学基金

1+阅读 · 2009年12月31日

基于2D视频视觉关注度的3D重建方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

异构的P2P覆盖网环境中容错的视频编码及传输

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员