For text-level discourse analysis, there are various discourse schemes but relatively few labeled data, because discourse research is still immature and it is labor-intensive to annotate the inner logic of a text. In this paper, we attempt to unify multiple Chinese discourse corpora under different annotation schemes with discourse dependency framework by designing semi-automatic methods to convert them into dependency structures. We also implement several benchmark dependency parsers and research on how they can leverage the unified data to improve performance.
翻译:在文字层面的谈话分析方面,有各种讨论计划,但标签数据相对较少,因为讨论研究仍然不成熟,说明文本的内在逻辑需要大量人力。 在本文中,我们试图通过设计半自动方法将其转化为依赖结构,将多个中国对话公司与不同的演讲依赖框架合并到不同的注解计划之下。 我们还实施了若干基准依赖分析,并研究它们如何利用统一数据改善绩效。