In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which introduces a series of uncertainties and generally leads to errors in the computation of the constituency tree afterward. This work proposes a method for joint Chinese word segmentation and Span-based Constituency Parsing by adding extra labels to individual Chinese characters on the parse trees. Through experiments, the proposed algorithm outperforms the recent models for joint segmentation and constituency parsing on CTB 5.1.
翻译:在选区划分中,跨线解码是一个重要方向,但是,对于中文来说,由于其语言特点,必须首先使用其他模型来进行字分割,这会造成一系列不确定性,并通常导致选区树后计算错误。这项工作提出了一个中国字分割和泛线选区划分联合方法,在粗糙的树上为中国个人字符添加额外标签。通过实验,拟议的算法优于最近关于CTB 5.1的联合分割和选区划分模式。