A neural probabilistic language model (NPLM) provides an idea to achieve the better perplexity than n-gram language model and their smoothed language models. This paper investigates application area in bilingual NLP, specifically Statistical Machine Translation (SMT). We focus on the perspectives that NPLM has potential to open the possibility to complement potentially `huge' monolingual resources into the `resource-constraint' bilingual resources. We introduce an ngram-HMM language model as NPLM using the non-parametric Bayesian construction. In order to facilitate the application to various tasks, we propose the joint space model of ngram-HMM language model. We show an experiment of system combination in the area of SMT. One discovery was that our treatment of noise improved the results 0.20 BLEU points if NPLM is trained in relatively small corpus, in our case 500,000 sentence pairs, which is often the case due to the long training time of NPLM.
翻译:神经概率语言模型(NPLM)提供了一种实现比正克语言模型及其平滑语言模型更好的复杂度的构想。本文研究了双语国家语言模型的应用领域,特别是统计机器翻译(SMT),我们侧重于NPLM有可能为“资源限制”双语资源提供潜在的“大型”单一语言资源的可能性。我们采用非参数巴伊西亚构建的ngm-HMM语言模型作为NPLM语言模型。为了便利各项任务的应用,我们提出了ngm-HMM语言模型的联合空间模型。我们展示了在SMT领域进行系统组合的实验。一个发现是,如果NPLM在相对小的体力上接受培训,那么我们对于NPLM的0.20 BLEU点的处理就改善了结果。在我们的情况中,50万对判决配对,这通常是由于NPLM培训时间过长。