Can we construct a neural model that is inductively biased towards learning human languages? Motivated by this question, we aim at constructing an informative prior over neural weights, in order to adapt quickly to held-out languages in the task of character-level language modeling. We infer this distribution from a sample of typologically diverse training languages via Laplace approximation. The use of such a prior outperforms baseline models with an uninformative prior (so-called "fine-tuning") in both zero-shot and few-shot settings. This shows that the prior is imbued with universal phonological knowledge. Moreover, we harness additional language-specific side information as distant supervision for held-out languages. Specifically, we condition language models on features from typological databases, by concatenating them to hidden states or generating weights with hyper-networks. These features appear beneficial in the few-shot setting, but not in the zero-shot setting. Since the paucity of digital texts affects the majority of the world's languages, we hope that these findings will help broaden the scope of applications for language technology.
翻译:我们能否构建一个偏向于学习人类语言的神经模型? 由这一问题驱动,我们的目标是构建一个信息性先于神经重量的神经模型,以便迅速适应在品格层面语言模型任务中被搁置的语言。 我们通过Laplace近似(Laplace press)从典型多样培训语言样本中推断出这种分布。 使用这种先于优异的基线模型,在零点和微点设置中都使用不提供信息性先验( 所谓的“ 微调 ” ) 。 这表明前一种信息含有普遍的感官学知识。 此外,我们利用额外的特定语言的侧面信息来远程监督被搁置的语言。 具体地说,我们将类型数据库的特征作为语言模型的条件,将其配置为隐蔽状态或生成超强网络的重量。 这些特征在微点设置中似乎是有益的,但并非零点设置。 由于数字文本的匮乏影响世界语言的大多数,我们希望这些发现将有助于扩大语言技术的应用范围。