Deep generative models, and particularly facial animation schemes, can be used in video conferencing applications to efficiently compress a video through a sparse set of keypoints, without the need to transmit dense motion vectors. While these schemes bring significant coding gains over conventional video codecs at low bitrates, their performance saturates quickly when the available bandwidth increases. In this paper, we propose a layered, hybrid coding scheme to overcome this limitation. Specifically, we extend a codec based on facial animation by adding an auxiliary stream consisting of a very low bitrate version of the video, obtained through a conventional video codec (e.g., HEVC). The animated and auxiliary videos are combined through a novel fusion module. Our results show consistent average BD-Rate gains in excess of -30% on a large dataset of video conferencing sequences, extending the operational range of bitrates of a facial animation codec alone
翻译:深基因模型,特别是面部动画计划,可以在视频会议应用中使用深基因模型,在不需传输浓密运动矢量的情况下,通过分散的一组关键点有效地压缩视频,而无需传输密集运动矢量。尽管这些计划在低比位速率的常规视频编码器上带来显著的编码收益,但当可用带宽增加时,其性能会效会迅速饱和。在本文中,我们提议了一个基于面部动画的分层混合编码方案。具体地说,我们扩展了一个基于面部动画的编码器,增加了一个由非常低比特率的视频版本组成的辅助流程,仅通过一个传统的视频编码器(例如,HEVC)获得。动画和辅助视频是通过一个新的聚合模块结合的。我们的结果显示,在大型视频会议序列中,平均BD-Rate增幅在-30%以上,扩大了面片动标码的操作范围。