In speech communication, how something is said (paralinguistic information) is as crucial as what is said (linguistic information). As a type of paralinguistic information, English speech uses sentence stress, the heaviest prominence within a sentence, to convey emphasis. While different placements of sentence stress communicate different emphatic implications, current speech translation systems return the same translations if the utterances are linguistically identical, losing paralinguistic information. Concentrating on focus, a type of emphasis, we propose mapping paralinguistic information into the linguistic domain within the source language using lexical and grammatical devices. This method enables us to translate the paraphrased text representations instead of the transcription of the original speech and obtain translations that preserve paralinguistic information. As a first step, we present the collection of an English corpus containing speech that differed in the placement of focus along with the corresponding text, which was designed to reflect the implied meaning of the speech. Also, analyses of our corpus demonstrated that mapping of focus from the paralinguistic domain into the linguistic domain involved various lexical and grammatical methods. The data and insights from our analysis will further advance research into paralinguistic translation. The corpus will be published via LDC and our website.
翻译:在语言交流中,语言信息(语言信息)的表达方式与语言信息(语言信息)的表达方式一样重要。作为一种语言信息类型,英语使用语言使用句子压力,这是句子内最强烈的强调,以传达重点。虽然不同的句子压力位置传达了不同的强烈影响,但目前的语言翻译系统在语言表达方式与语言相同、失去语言信息的情况下,返回了相同的译文。侧重于重点,一种强调方式,我们建议用词汇和语法设备将语言方面的语言信息映射到源语言领域。这一方法使我们能够翻译原语文本的表达方式,而不是原言的抄录,并获得能够保存语言信息的译文。作为第一步,我们提供一套英语文集,其中载有在重点位置与相应文本之间有差异的言论,目的是反映语言的隐含含义。此外,对我们的文集的分析表明,从语言学领域到语言领域的重点的映射图涉及各种词汇学和语法学和语法学方法。我们所出版的数据和见解将进一步通过最不发达国家网站进行分析。