Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by developing controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al.,2016; Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax.
翻译:人类可以从微小的经验中学习关于一个单词的结构属性,并在不同的语法背景中统一运用他们所学到的共性表达方式。我们评估现代神经语言模型在英语中复制这种行为的能力,并评估结构监督对学习结果的影响。首先,我们通过开发受控实验来评估微小的学习能力,检测模型的共性名义数字和口头辩论结构,对在培训过程中仅被看作两次的象征物进行常规化分析。第二,我们评估了学术代表性的差别性能:模型能够将共性概括从基础环境(例如简单的宣言性积极声音句)转换到变化环境(例如审讯性句子)的能力。我们测试了用同一数据集培训的四个模型:一个正本基线、一个LSTM和两个经过明确结构监督培训的LSTM变量(Dyer等人,2016年;Charniak等人,2016年)。我们发现,在多数情况下,神经模型能够将适当的共性概括性概括化从基础背景(例如:在最低限度的接触后,通常从结构背景中,从结构驱动的预期到最精确的双重的模型,在两个背景中,从结构背景中,从所有学习的模型都能够使结构背景中,从结构背景中,从结构化的特性变化到最精确地体现在。