Modal verbs, such as "can", "may", and "must", are commonly used in daily communication to convey the speaker's perspective related to the likelihood and/or mode of the proposition. They can differ greatly in meaning depending on how they're used and the context of a sentence (e.g. "They 'must' help each other out." vs. "They 'must' have helped each other out.") Despite their practical importance in natural language understanding, linguists have yet to agree on a single, prominent framework for the categorization of modal verb senses. This lack of agreement stems from high degrees of flexibility and polysemy from the modal verbs, making it more difficult for researchers to incorporate insights from this family of words into their work. This work presents Moverb dataset, which consists of 27,240 annotations of modal verb senses over 4,540 utterances containing one or more sentences from social conversations. Each utterance is annotated by three annotators using two different theoretical frameworks (i.e., Quirk and Palmer) of modal verb senses. We observe that both frameworks have similar inter-annotator agreements, despite having different numbers of sense types (8 for Quirk and 3 for Palmer). With the RoBERTa-based classifiers fine-tuned on \dataset, we achieve F1 scores of 82.2 and 78.3 on Quirk and Palmer, respectively, showing that modal verb sense disambiguation is not a trivial task. Our dataset will be publicly available with our final version.
翻译:诸如“ can” 、 “ may” 和 “ must” 等 Modal 动词, 通常在日常交流中用于传达演讲者与提议的可能性和/或模式有关的观点。 这些动词在含义上可能大不相同, 取决于它们是如何使用的, 以及句子的背景( 例如“ 他们必须互相帮助 ” 。 ) 与“ 他们必须互相帮助 ” 。) 尽管它们在自然语言理解中具有实际重要性, 语言学家们尚未商定一个单一的、 突出的模型动词感分类框架。 这种缺乏一致的原因是, 与模型动词的动词和/ 模式动词具有高度的灵活性和多度。 这使得研究人员更难以将这一类词的洞见纳入他们的工作。 这份工作展示了 Moverb 数据集, 由 4, 540 语由4, 540 语言感说明, 包含社会对话中一个或更多句的句子。 每句子, 由3个不同的理论框架( i., Querk, 和Pal Q ) 和 Pal- dalal- daltial 等 格式, 都显示我们有相似的阶值。