This paper presents a vision transformer (ViT) based joint source and channel coding (JSCC) scheme for wireless image transmission over multiple-input multiple-output (MIMO) systems, called ViT-MIMO. The proposed ViT-MIMO architecture, in addition to outperforming separation-based benchmarks, can flexibly adapt to different channel conditions without requiring retraining. Specifically, exploiting the self-attention mechanism of the ViT enables the proposed ViT-MIMO model to adaptively learn the feature mapping and power allocation based on the source image and channel conditions. Numerical experiments show that ViT-MIMO can significantly improve the transmission quality cross a large variety of scenarios, including varying channel conditions, making it an attractive solution for emerging semantic communication systems.
翻译:本文介绍了一个基于视觉变压器(ViT)的联合源码和频道编码(JSCC)方案,用于对多种投入的多产出系统(MIMO)进行无线图像传输,称为ViT-MIMO。拟议的ViT-MIMO结构,除了业绩优于基于分离的基准外,还可以灵活地适应不同的频道条件,而无需再培训。具体地说,利用ViT的自知机制,使拟议的ViT-MIMO模型能够根据源图像和频道条件适应性地学习地貌制图和电力分配。数字实验表明,ViT-MIMO可以显著改善传输质量,跨越多种情景,包括不同的频道条件,使它成为新兴的语义通信系统的一个有吸引力的解决方案。