In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the receiver based on the received semantic features, which significantly reduces the required amount of data transmission without performance degradation. Then, we perform speech synthesis at the receiver, which dedicates to re-generate the speech signals by feeding the recognized text and the speaker information into a neural network module. To enable the DeepSC-ST adaptive to dynamic channel environments, we identify a robust model to cope with different channel conditions. According to the simulation results, the proposed DeepSC-ST significantly outperforms conventional communication systems and existing DL-enabled communication systems, especially in the low signal-to-noise ratio (SNR) regime. A software demonstration is further developed as a proof-of-concept of the DeepSC-ST.
翻译:在本文中,我们开发了一个基于深度学习的语义通信系统,名称为DeepSC-ST。我们将语音识别和语音合成作为通信系统的传输任务。首先,通过联合语义-信道编码器提取与语音识别相关的语义特征进行传输,在接收端根据接收到的语义特征恢复文本,这显著减少了所需的数据传输量,而不会降低性能。然后,我们在接收端执行语音合成,通过将识别的文本和说话人信息输入神经网络模块来重新生成语音信号。为了使DeepSC-ST适应动态的信道环境,我们确定了一个健壮的模型来应对不同的信道条件。根据模拟结果,所提出的DeepSC-ST在低信噪比(SNR)范围内显著优于传统通信系统和现有DL支持的通信系统。我们还开发了一款软件演示作为DeepSC-ST的概念证明。