Recent neural supervised topic segmentation models achieve distinguished superior effectiveness over unsupervised methods, with the availability of large-scale training corpora sampled from Wikipedia. These models may, however, suffer from limited robustness and transferability caused by exploiting simple linguistic cues for prediction, but overlooking more important inter-sentential topical consistency. To address this issue, we present a discourse-aware neural topic segmentation model with the injection of above-sentence discourse dependency structures to encourage the model make topic boundary prediction based more on the topical consistency between sentences. Our empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter with our proposed strategy can substantially improve its performances on intra-domain and out-of-domain data, with little increase of model's complexity.
翻译:最近的神经监督主题分解模式取得了显著的超然效果,而没有受到监督的方法。 可以从维基百科中抽样获得大规模培训公司。 然而,这些模式可能因利用简单的语言提示进行预测而导致的稳健性和可转移性有限,但忽视了更重要的当代主题一致性。 为解决这一问题,我们提出了一个有讨论意识的神经分解模式,注入了上等理论依赖性结构,以鼓励模型更多地根据判决之间的主题一致性进行主题边界预测。 我们对英国评价数据集的实证研究表明,将超感性对话结构注入神经主题分解器,与我们提出的战略相比,可以大大改善其在内部和外部数据方面的表现,而模型的复杂性则小有增加。