Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. % and capture low-level information about word forms. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.
翻译:声词嵌入模型(AWES)学会在固定维度矢量表示中绘制可变长的口头文字部分图,这样就可以在嵌入空间附近预测同一词的不同声学模拟器。除了语言技术应用外,AWES模型还展示了在各种听觉词汇处理任务中预测人的性能。当前的AWES模型以神经网络为基础,并经过自下而上的方法培训,该模型结合声音信号,在声音或象征性监督信号下构建一个单词表达式。因此,这些模型在学习过程中没有利用或捕捉到高层次的词汇知识。% 并捕捉到关于文字形式的低层次信息。在本文件中,我们提出了一个多任务学习模型,将自上而下的词汇知识纳入AWES的培训程序。我们的模型学习了声学投入和词义表达法之间的映射图,将高层次信息,例如词义语义和自下而上式形式的监督之外,都用三种语言进行实验,并表明纳入词学知识可以改进空间磁性,鼓励模型到更好的单独法类别。