Neural machine translation (NMT) has a drawback in that can generate only high-frequency words owing to the computational costs of the softmax function in the output layer. In Japanese-English NMT, Japanese predicate conjugation causes an increase in vocabulary size. For example, one verb can have as many as 19 surface varieties. In this research, we focus on predicate conjugation for compressing the vocabulary size in Japanese. The vocabulary list is filled with the various forms of verbs. We propose methods using predicate conjugation information without discarding linguistic information. The proposed methods can generate low-frequency words and deal with unknown words. Two methods were considered to introduce conjugation information: the first considers it as a token (conjugation token) and the second considers it as an embedded vector (conjugation feature). The results using these methods demonstrate that the vocabulary size can be compressed by approximately 86.1% (Tanaka corpus) and the NMT models can output the words not in the training data set. Furthermore, BLEU scores improved by 0.91 points in Japanese-to-English translation, and 0.32 points in English-to-Japanese translation with ASPEC.
翻译:由于输出层软模功能的计算成本, 自动神经机翻译( NMT) 具有一个缺陷, 只能产生高频单词。 在日- 英文输出层软模函数的计算成本 。 在日- 英文的 NMT 中, 日本的上游同化导致词汇大小的增加。 例如, 一个动词可以有多达19个表层品种。 在这项研究中, 我们侧重于压缩日文的词汇大小的上游同化聚合。 词汇列表中填满了各种形式的动词。 我们建议使用上游同化信息的方法, 而不丢弃语言信息 。 提议的方法可以产生低频单词, 并处理未知的单词 。 考虑采用两种方法来引入同化信息: 第一个将它视为象征性的( 合成符号), 第二个将它视为嵌入矢量的矢量( 聚合特性 ) 。 使用这些方法的结果表明, 词汇大小可以压缩约86.1% ( tanakapramic) 和 NMTMT 模型可以输出非培训数据集中的单词 。 此外, BLEUEU的评分还改进了0. 0. 0.91点, 和 AS- Pen- Chem- chem- be- be- be- be- prishermain- bechal- bechy- bein- be chilverd- sin- be 和 y- be- be- be- bechy- bechy