Ambiguity is a natural language phenomenon occurring at different levels of syntax, semantics, and pragmatics. It is widely studied; in Psycholinguistics, for instance, we have a variety of competing studies for the human disambiguation processes. These studies are empirical and based on eyetracking measurements. Here we take first steps towards formalizing these processes for semantic ambiguities where we identified the presence of two features: (1) joint plausibility degrees of different possible interpretations, (2) causal structures according to which certain words play a more substantial role in the processes. The novel sheaf-theoretic model of definite causality developed by Gogioso and Pinzani in QPL 2021 offers tools to model and reason about these features. We applied this theory to a dataset of ambiguous phrases extracted from Psycholinguistics literature and their human plausibility judgements collected by us using the Amazon Mechanical Turk engine. We measured the causal fractions of different disambiguation orders within the phrases and discovered two prominent orders: from subject to verb in the subject-verb and from object to verb in the verb object phrases. We also found evidence for delay in the disambiguation of polysemous vs homonymous verbs, again compatible with Psycholinguistic findings.
翻译:歧义是一种自然语言现象,在句法、语义和语用的不同层面上出现。它得到广泛研究,在心理语言学等领域我们有多种竞争性的人类消歧过程研究。这些研究是基于实证和眼动仪测量的。这里我们首先为语义歧义过程形式化奠定基础。我们识别了两个特征:(1)不同可能解释的联合可信度,(2)根据特定单词在过程中起到更重要作用的因果结构。Gogioso和Pinzani在QPL 2021中开发的基于定义因果关系的新型码桉理论提供了建模和推理这些特征的工具。我们将这个理论应用于从心理语言学文献中提取的一组歧义短语数据集,并通过使用Amazon Mechanical Turk引擎收集的人类可信度判断进行了测试。我们在短语中测量了不同消歧顺序的因果比例,并发现两个显著的顺序:在主谓短语中从主语到动词,在动宾短语中从宾语到动词。我们还发现了多义动词和同形异义动词消歧的延迟证据,这与心理语言学研究结果一致。