This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possible to extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independent of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image based methods on the typical FER benchmarks with about 3$\times$ faster inference with compressed data.
翻译:本文旨在探索元素间变异在压缩视频域中消除面部表达式。 大多数先前的方法处理一个序列的 RGB 图像, 而压缩格式中已经嵌入了现成和有价值的表达式肌肉运动。 在最多两个数量级的压缩域中, 我们可以明确从剩余框中推断出表达式, 并有可能通过预先培训的面部识别网络从 I 框架中提取身份要素。 通过强制实施边际独立表达式, 预计表达式会更加纯洁, 并且对身份转换更加有力。 我们不需要同一人的身份标签或多个表达式样本来消除身份 。 此外, 当数据集中附加了标记时, 可以进一步增加补充性限制来规范地层游戏 。 在测试中, 只需要压缩的剩余框架才能实现表达预测 。 我们的解决方案可以比基于典型 FER 基准的解码图像的最近方法实现相似或更好的性能。 使用压缩数据快速的引用率约为 3 美元 。