The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem -- a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the context-sensitive variation of the constituent elementary prosodic contours. We show that the VPM can give insight into the intrinsic variability of these prosodic prototypes through learning a meaningful prosodic latent space representation structure. We also show that the VPM is able to capture prosodic phenomena that have multiple dimensions of context based variability. Since it is based on the principle of superposition, the VPM does not necessitate the use of specially crafted corpora for the analysis, opening up the possibilities of using big data for prosody analysis. In a speech synthesis scenario, the model can be used to generate a dynamic and natural prosody contour that is devoid of averaging effects.
翻译:寻求将语言功能和语言功能与预想形式联系起来的全面基因化模型是语言通信研究的一个长期挑战。传统基因化模型已经让位于利用数百万条金枪鱼参数培训通用端到端绘图的深层次学习(DL)技术的压倒性表现。但是,转向黑盒机器学习模型带来了反向问题 -- -- 迫切需要发现知识、解释、视觉化和解释。我们的工作桥梁是将进化和语言功能功能与最新DL技术相结合的全面基因化模型。我们利用超常功能时装模型模型的建模模型模型,并提议采用动态感知模型模型模型,以利用变异性轮廓生成的网络来捕捉成份基本原状轮廓的背景变异性。我们表明,VPM可以通过学习有意义的先质潜伏空间代表结构来洞察这些原型模型的内在变异性。我们还表明,VPM能够捕捉出具有多种变异性变异性模型的模型模型模型模型模型模型模型模型模型,而不是超常性变异性模型,因此需要使用一个精确的自然变异性模型分析。因为VPM的模型可以使用一个精确的自然变现模型,因此,因此可以使用一个快速分析。