The task of argument mining aims to detect all possible argumentative components and identify their relationships automatically. As a thriving field in natural language processing, there has been a large amount of corpus for academic study and application development in argument mining. However, the research in this area is still constrained by the inherent limitations of existing datasets. Specifically, all the publicly available datasets are relatively small in scale, and few of them provide information from other modalities to facilitate the learning process. Moreover, the statements and expressions in these corpora are usually in a compact form, which means non-adjacent clauses or text segments will always be regarded as multiple individual components, thus restricting the generalization ability of models. To this end, we collect and contribute a novel dataset AntCritic to serve as a helpful complement to this area, which consists of about 10k free-form and visually-rich financial comments and supports both argument component detection and argument relation prediction tasks. Besides, in order to cope with the challenges and difficulties brought by scenario expansion and problem setting modification, we thoroughly explore the fine-grained relation prediction and structure reconstruction scheme for free-form documents and discuss the encoding mechanism for visual styles and layouts. And based on these analyses, we design two simple but effective model architectures and conduct various experiments on this dataset to provide benchmark performances as a reference and verify the practicability of our proposed architecture.
翻译:争论采矿的任务旨在探测所有可能的争论组成部分,并自动查明它们之间的关系。作为自然语言处理的一个蓬勃的领域,在争论采矿中有大量学术研究和应用开发的主体。然而,这个领域的研究仍然受到现有数据集内在局限性的限制。具体地说,所有公开的数据集规模相对较小,很少能提供其他模式的信息,以促进学习进程。此外,这些公司的陈述和表述通常采取紧凑的形式,这意味着非对称条款或文本部分将永远被视为多个单个组成部分,从而限制模型的概括化能力。为此,我们收集并贡献了一个新的数据集AntCritict, 作为对这一领域有用的补充,它由大约10公里的自由形式和视觉丰富的金融评论组成,支持论证组成部分的检测和论证关系预测任务。此外,为了应对设想扩展和问题设置带来的挑战和困难,我们彻底探索了与结构重建有关的精细比关系,以自由格式文件为基准,从而限制模型的通用能力。我们收集并贡献了一套新型的数据集,并讨论了这些设计结构的简单化机制。我们根据这些视觉风格和基准,提供了一种简单化的模型和精确性分析。