We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative parser based on a bracketing transduction grammar whose derivation tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one. We use the same seq2seq model to translate at all phrase scales, which results in two inference modes: one mode in which the parser is discarded and only the seq2seq component is used at the sequence-level, and another in which the parser is combined with the seq2seq model. Decoding in the latter mode is done with the cube-pruned CKY algorithm, which is more involved but can make use of new translation rules during inference. We formalize our model as a source-conditioned synchronous grammar and develop an efficient variational inference algorithm for training. When applied on top of both randomly initialized and pretrained seq2seq models, we find that both inference modes performs well compared to baselines on small scale machine translation benchmarks.
翻译:我们描述一个维持标准序列到序列(seq2seq)模型灵活性的神经感应器,同时将等级短语作为培训期间诱导偏差的来源和推断过程中的明显限制。我们的方法培养了两种模式:一种基于括号移植语法的有区别的剖析器,其衍生树按等级排列源和目标短语,另一种神经后继2seq 模型,该模型学会逐个翻译一致的短语。我们使用同样的后继2seq模型在所有短语尺度上翻译,这导致两种推断模式:一种模式,即在序列级别上抛弃了剖析器,而只有后继2equ 组件使用。另一种模式,即根据括号括号的导数,将剖析器与后导语法的源式对齐对齐,然后将后导法与后导语法对立模式进行分解,这种模式在推论期间将更多涉及但可以使用新的翻译规则。我们将模型正规化为源调节的同步拼写法模式,并发展一种高效的初始定位模型,在排序一级一级一级使用后演算模型时,在最高演定的排序中进行。