LESA: 在线内容中的一般索赔检测 (LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content)

The conceptualization of a claim lies at the core of argument mining. The segregation of claims is complex, owing to the divergence in textual syntax and context across different distributions. Another pressing issue is the unavailability of labeled unstructured text for experimentation. In this paper, we propose LESA, a framework which aims at advancing headfirst into expunging the former issue by assembling a source-independent generalized model that captures syntactic features through part-of-speech and dependency embeddings, as well as contextual features through a fine-tuned language model. We resolve the latter issue by annotating a Twitter dataset which aims at providing a testing ground on a large unstructured dataset. Experimental results show that LESA improves upon the state-of-the-art performance across six benchmark claim datasets by an average of 3 claim-F1 points for in-domain experiments and by 2 claim-F1 points for general-domain experiments. On our dataset too, LESA outperforms existing baselines by 1 claim-F1 point on the in-domain experiments and 2 claim-F1 points on the general-domain experiments. We also release comprehensive data annotation guidelines compiled during the annotation phase (which was missing in the current literature).

翻译：主张的概念化是论据挖掘的核心。由于不同分布的文字语法和背景存在差异,索赔的分类十分复杂。另一个紧迫的问题是没有标签的无结构化实验文本。在本文中,我们提出LESA,这是一个框架,旨在通过收集一个独立源的通用模型,通过部分语音和依赖嵌入和依赖嵌入,以及通过一个微调的语言模型来捕捉合成特征,以及背景特征,从而将索赔的概念分割为复杂。我们通过一个Twitter数据集来解决后一个问题,该数据集的目的是在大型非结构化数据集上提供一个测试场。实验结果显示,LESA在六个基准索赔数据组中提高了最新性能,平均3个索赔-F1点用于现场实验,2个索赔-F1点用于普通实验。关于我们的数据集,LESA在内部实验中比现有基线高出1个索赔-F1点,而在目前综合实验中,我们所汇编了1个索赔-F1号指南,在目前全面实验阶段中,我们又汇编了1个数据单元。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【KDD2020】基于知识图谱的语义融合改进会话推荐系统，Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

专知会员服务

90+阅读 · 2020年7月9日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日