We test the hypothesis that discourse predictability influences Hindi syntactic choice. While prior work has shown that a number of factors (e.g., information status, dependency length, and syntactic surprisal) influence Hindi word order preferences, the role of discourse predictability is underexplored in the literature. Inspired by prior work on syntactic priming, we investigate how the words and syntactic structures in a sentence influence the word order of the following sentences. Specifically, we extract sentences from the Hindi-Urdu Treebank corpus (HUTB), permute the preverbal constituents of those sentences, and build a classifier to predict which sentences actually occurred in the corpus against artificially generated distractors. The classifier uses a number of discourse-based features and cognitive features to make its predictions, including dependency length, surprisal, and information status. We find that information status and LSTM-based discourse predictability influence word order choices, especially for non-canonical object-fronted orders. We conclude by situating our results within the broader syntactic priming literature.
翻译:虽然先前的工作表明,一些因素(例如信息状况、依赖长度和合成超常)影响印地语顺序偏好,但文献中却未充分探讨话语可预测性的作用。我们受以前关于综合理论的启发,调查一个句子中的文字和合成结构如何影响下一句的词顺序。具体地说,我们从印地语-乌尔都树库(HUTB)中提取句子,渗透这些句子的预言成分,并建立一个分类器,以预测在文体中实际发生哪些判决是针对人为生成的分心器的。分类器使用一些基于话语的特征和认知特征来作出预测,包括依赖长度、推测和信息状况。我们发现信息状况和基于LSTM的言语可预测性会影响文字顺序选择,特别是非癌症对象对立的顺序。我们通过将我们的结果置于更广泛的合成理论文献中。