Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform -- is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size. Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech. In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy. Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms. Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods.
翻译:理解在贴有标记的磁共振和可感知的演讲中看到的舌头和眼部肌肉变形之间的深层关系,在推进语音发动机控制理论和治疗与语音有关的病症方面发挥着重要作用。然而,由于这两种模式(即二维(中成片)加上时间标记-磁共振序列及其相应的一维波形)之间的直接映射形式各不相同,因此,在两种模式(即二维(中成片)加上时间标记-磁共振序列及其相应的一维波形)之间的直接映射并非直截了当的。相反,我们采用两维谱谱图作为中间代表形式,包括投影和共振反应,从中发展一个端到端的深层次学习框架,从一个标记-磁共振的序列转换成相应的音频波形结构。我们的框架基于一种新型的全端对端对端训练方法,从一个加对端的对端对端对端对端对立网络,将它从一个加对端对端的对立面的对立式的对立式的对立式网络转换成一个对立面的对立面的对立面的对立面的对立式,以升级的对立式的对立式的对立式对立式对立式的对立式的对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式的对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对立式对