使用深创模型的低频带宽度视频电动压缩 (Low Bandwidth Video-Chat Compression using Deep Generative Models)

Maxime Oquab,Pierre Stock,Oran Gafni,Daniel Haziza,Tao Xu,Peizhao Zhang,Onur Celebi,Yana Hasson,Patrick Labatut,Bobo Bose-Kolanu,Thibault Peyronel,Camille Couprie

from arxiv, 11 pages

To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side and transmitted over the network. In this context, we discuss and evaluate the benefits and disadvantages of several deep adversarial approaches. In particular, we explore quality and bandwidth trade-offs for approaches based on static landmarks, dynamic landmarks or segmentation maps. We design a mobile-compatible architecture based on the first order animation model of Siarohin et al. In addition, we leverage SPADE blocks to refine results in important areas such as the eyes and lips. We compress the networks down to about 3MB, allowing models to run in real time on iPhone 8 (CPU). This approach enables video calling at a few kbits per second, an order of magnitude lower than currently available alternatives.

翻译：为了让数以亿计因连接不畅通或数据成本低廉而受到阻碍的人开通视频聊天,我们提议利用发件人一侧提取并通过网络传送的面部标志,真实地重建接收器设备上的脸孔。在这方面,我们讨论和评价若干深层对立方法的利弊。特别是,我们探索以静态地标、动态地标或分割图为基础的方法的质量和带宽取舍。我们根据Siarohin等人的第一顺序动画模型设计了一个可移动兼容的结构。此外,我们利用SPADE块来改进重要领域,例如眼睛和嘴唇的结果。我们将网络压缩到大约3MB,允许iPhone 8(CPU)上实时运行模型。这个方法使得视频每秒能用几千位,比特,比目前可用的替代方法要低一个数量级。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/