To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored big surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93\% bitrate savings for high motion video sequences
翻译:为了利用同一场景的视频框中的高时间相关性,根据已经编码的参考框架,使用块状运动估计和补偿技术预测了当前框架。虽然这种方法能够有效地利用移动物体的翻译运动,但很容易受到其他类型的飞毛腿运动和物体隔离/隔离的影响。最近,利用了深层次的学习,从短视短视中模拟了人类构成的高层次结构,然后通过使用基因化对称网络(GAN)来预测未来时生成虚拟框架。因此,通过预测人类行动并确定其轨迹,模拟人类表面的高层结构能够利用语义相关性。视频监视应用将因储存的大型监视数据可以通过估计人类的外形动作和物体隔离/分离来压缩而受益。本文探索了一种新的视频拼接方式,通过模拟已经编码的短视框,并利用当前生成的框架作为额外的前向参照框架。因此,预计拟议采用的方法可以克服传统后向型结构的语系关联性关系,通过预测低位的P83-83号图像定位框架,通过预测显示低位的磁标,从而显示P-83号图像访问率,从而显示低位变后定位,从而显示P-83号将显示预变速度。