With a growing need for robust and general discourse structures in many downstream tasks and real-world applications, the current lack of high-quality, high-quantity discourse trees poses a severe shortcoming. In order the alleviate this limitation, we propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective. The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others. However, due to the especially difficult annotation process to generate discourse trees, we initially develop such method to complement task-specific models in generating much larger and more diverse discourse treebanks.
翻译:随着许多下游任务和现实世界应用中日益需要稳健和一般的讨论结构,目前缺乏高质量和高数量的讨论树木,造成了严重的缺陷。为了缓解这一限制,我们提议了一项新战略,通过扩大具有自动编码目标的潜在树木上岗框架,以任务不可知、不受监督的方式产生树木结构。提议的办法可以适用于任何树结构目标,如综合分类、谈话分割和其他目标。然而,由于生成讨论树木的注释过程特别困难,我们最初开发了这种方法,以补充产生更大、更多样化的讨论树库的具体任务模式。