Recent deep learning methods have led to increased interest in solving high-efficiency end-to-end transmission problems. These methods, we call nonlinear transform source-channel coding (NTSCC), extract the semantic latent features of source signal, and learn entropy model to guide the joint source-channel coding with variable rate to transmit latent features over wireless channels. In this paper, we propose a comprehensive framework for improving NTSCC, thereby higher system coding gain, better model versatility, and more flexible adaptation strategy aligned with semantic guidance are all achieved. This new sophisticated NTSCC model is now ready to support large-size data interaction in emerging XR, which catalyzes the application of semantic communications. Specifically, we propose three useful improvement approaches. First, we introduce a contextual entropy model to better capture the spatial correlations among the semantic latent features, thereby more accurate rate allocation and contextual joint source-channel coding are developed accordingly to enable higher coding gain. On that basis, we further propose response network architectures to formulate versatile NTSCC, i.e., once-trained model supports various rates and channel states that benefits the practical deployment. Following this, we propose an online latent feature editing method to enable more flexible coding rate control aligned with some specific semantic guidance. By comprehensively applying the above three improvement methods for NTSCC, a deployment-friendly semantic coded transmission system stands out finally. Our improved NTSCC system has been experimentally verified to achieve 16.35% channel bandwidth saving versus the state-of-the-art engineered VTM + 5G LDPC coded transmission system with lower processing latency.
翻译:最近的深度学习方法引起了人们对高效端到端传输问题的兴趣。这些方法被称为非线性变换源通道编码(NTSCC),可以提取源信号的语义潜在特征,并学习熵模型以引导可变速率的联合源通道编码来传输潜在特征。本文提出了一个全面的框架来改进NTSCC,从而实现更高的系统编码增益、更好的模型多功能性和更灵活的语义引导适应策略。这种新的复杂NTSCC模型现在已准备好支持在新兴XR中进行大规模数据交互,从而促进语义通信的应用。具体而言,我们提出了三种有用的改进方法。首先,我们引入了上下文熵模型,以更好地捕获语义潜在特征之间的空间相关性,因此可以更准确地进行速率分配和上下文联合源通道编码,从而实现更高的编码增益。在此基础上,我们进一步提出了响应网络架构来制定多功能NTSCC,即一次训练的模型支持不同的速率和通道状态,有利于实际部署。接下来,我们提出了一种在线潜在特征编辑方法,以实现更灵活的编码速率控制,以满足特定的语义引导。通过综合应用上述三种改进方法,最终实现了一种易于部署的语义编码传输系统。我们的改进NTSCC系统经过实验证明,与最先进的VTM + 5G LDPC编码传输系统相比,可实现节省16.35%的信道带宽,并具有更低的处理延迟。