Semantic communication addresses the limitations of the Shannon paradigm by focusing on transmitting meaning rather than exact representations, thereby reducing unnecessary resource consumption. This is particularly beneficial for video, which dominates network traffic and demands high bandwidth and power, making semantic approaches ideal for conserving resources while maintaining quality. In this paper, we propose a Predictability-aware and Entropy-adaptive Neural Motion Estimation (PENME) method to address challenges related to high latency, high bitrate, and power consumption in video transmission. PENME makes per-frame decisions to select a residual motion extraction model, convolutional neural network, vision transformer, or optical flow, using a five-step policy based on motion strength, global motion consistency, peak sharpness, heterogeneity, and residual error. The residual motions are then transmitted to the receiver, where the frames are reconstructed via motion-compensated updates. Next, a selective diffusion-based refinement, the Latent Consistency Model (LCM-4), is applied on frames that trigger refinement due to low predictability or large residuals, while predictable frames skip refinement. PENME also allocates radio resource blocks with awareness of residual motion and channel state, reducing power consumption and bandwidth usage while maintaining high semantic similarity. Our simulation results on the Vimeo90K dataset demonstrate that the proposed PENME method handles various types of video, outperforming traditional communication, hybrid, and adaptive bitrate semantic communication techniques, achieving 40% lower latency, 90% less transmitted data, and 35% higher throughput. For semantic communication metrics, PENME improves PSNR by about 40%, increases MS-SSIM by roughly 19%, and reduces LPIPS by nearly 35%, compared with the baseline methods.
翻译:语义通信通过专注于传输意义而非精确表示,克服了香农范式的局限性,从而减少了不必要的资源消耗。这对于视频传输尤为有益,因为视频占据了网络流量的主导地位,需要高带宽和高功耗,使得语义方法在保持质量的同时节约资源成为理想选择。本文提出了一种可预测性感知与熵自适应神经运动估计(PENME)方法,以应对视频传输中高延迟、高比特率和功耗相关的挑战。PENME基于运动强度、全局运动一致性、峰值锐度、异质性和残差误差的五步策略,逐帧决策选择残差运动提取模型、卷积神经网络、视觉Transformer或光流方法。残差运动随后被传输至接收端,通过运动补偿更新重建帧。接着,对于因可预测性低或残差较大而触发优化的帧,应用基于选择性扩散的优化方法——潜在一致性模型(LCM-4),而可预测帧则跳过优化步骤。PENME还结合残差运动与信道状态感知分配无线资源块,在保持高语义相似度的同时降低功耗和带宽使用。我们在Vimeo90K数据集上的仿真结果表明,所提出的PENME方法能处理各类视频,性能优于传统通信、混合及自适应比特率语义通信技术,实现了延迟降低40%、传输数据量减少90%、吞吐量提升35%。在语义通信指标方面,与基线方法相比,PENME将PSNR提高了约40%,MS-SSIM提升了约19%,LPIPS降低了近35%。