Argumentation analysis is a field of computational linguistics that studies methods for extracting arguments from texts and the relationships between them, as well as building argumentation structure of texts. This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts within the framework of the Dialogue conference. During the competition, the participants were offered two tasks: stance detection and argument classification. A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic (vaccination, quarantine, and wearing masks) was prepared, annotated, and used for training and testing. The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture, automatic translation into English to apply a specialized BERT model, retrained on Twitter posts discussing COVID-19, as well as additional masking of target entities. This system showed the following results: for the stance detection task an F1-score of 0.6968, for the argument classification task an F1-score of 0.7404. We hope that the prepared dataset and baselines will help to foster further research on argument mining for the Russian language.
翻译:计算语言分析是计算语言分析的一个领域,它研究从文本中提取论据的方法和它们之间的关系,以及建立文本的论证结构;本文件是组织者关于对话会议框架内首次竞争关于俄文文本的论证分析系统的报告;在竞争期间,向与会者提供了两项任务:立场探测和论证分类;关于COVID-19大流行的三个主题(疫苗接种、检疫和戴面具)的包含9 550句(关于社交媒体文章的评论)的文集已经编写,附加说明,并用于培训和测试;在这两项任务中赢得第一位的系统使用了BERT结构的NLI(自然语言推断)变式,自动翻译成英文以应用专门的BERT模型,对讨论COVID-19的Twitter文章进行了再培训,并增加了目标实体的遮掩面。该系统显示以下结果:关于定位检测任务,F1分数为0.6968,用于参数分类任务,F1分数为0.7404。我们希望,为进一步推进关于俄语的研究,准备数据采集和基线。