Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests, while using fewer parameters than other models.
翻译:语法是我们思考语言的基础。 不捕捉输入语言的结构可能导致一般化问题和过度平衡化。 在目前的工作中, 我们提议一个新的通识语法语言模式: 同步有秩序内存( SOM ) 。 模型明确用递增分析器来模拟结构, 并维持标准语言模式( 左对右) 的有条件概率设置 。 为了培训递增的读取器并避免暴露偏差, 我们还提议了一个新颖的动态符咒, 以便 SOM 能够更有力地做出错误的解析决定 。 实验显示 SOM 在语言建模、 递增分解和综合化测试方面可以取得显著效果, 同时使用比其他模式更少的参数 。