Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech transmission which transmits only the semantic-relevant information over the channel for the speech recognition task, and a compact additional set of semantic-irrelevant information for the speech reconstruction task. We propose a novel end-to-end DL-based transceiver which extracts and encodes the semantic information from the input speech spectrums at the transmitter and outputs the corresponding transcriptions from the decoded semantic information at the receiver. For the speech to speech transmission, we further include a CTC alignment module that extracts a small number of additional semantic-irrelevant but speech-related information for the better reconstruction of the original speech signals at the receiver. The simulation results confirm that our proposed method outperforms current methods in terms of the accuracy of the predicted text for the speech to text transmission and the quality of the recovered speech signals for the speech to speech transmission, and significantly improves transmission efficiency. More specifically, the proposed method only sends 16% of the amount of the transmitted symbols required by the existing methods while achieving about 10% reduction in WER for the speech to text transmission. For the speech to speech transmission, it results in an even more remarkable improvement in terms of transmission efficiency with only 0.2% of the amount of the transmitted symbols required by the existing method.
翻译:近些年来,为高效传输图像、文本和语音,探索了基于语义的深度学习(DL)基础语义通信方法。与传统的侧重于传输抽象符号的无线通信方法相比,语义通信方法试图通过发送源数据中与语义有关的信息来提高传输效率。在本文中,我们考虑仅通过语音识别任务频道传输语义相关信息的语义导向语音传输,以及为语音重建任务提供一系列与语义无关的紧凑的额外语义信息。我们建议采用一个新的端到端的DL型转录器,从发送器的输入语音频谱中提取并编码出语义信息,并在接收器中输出出相应的语义信息。关于语音识别的语义传递,我们进一步考虑通过语音传输的语义调整模块,为更好地重建接收器的原语音信号。我们提议的语义转换方法甚至超越了当前语言表达方式的语义信息,具体要求将语言传输的文本质量转化为语言传输的文本质量,同时通过语言传输方式将现有语言传输到现有语言传输的文本的准确性,要求以更精确的文本传输方式,只是通过现有语言传输方式将语音传输到现有语言传输到现有语言传输的文本的文本质量。