There have been many proposals to reduce constituency parsing to tagging in the literature. To better understand what these approaches have in common, we cast several existing proposals into a unifying pipeline consisting of three steps: linearization, learning, and decoding. In particular, we show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing by performing a right-corner transformation on the grammar and making a specific independence assumption. Furthermore, we empirically evaluate our taxonomy of tagging pipelines with different choices of linearizers, learners, and decoders. Based on the results in English and a set of 8 typologically diverse languages, we conclude that the linearization of the derivation tree and its alignment with the input sequence is the most critical factor in achieving accurate taggers.
翻译:为了更好地了解这些方法的共同点,我们将若干现有建议纳入由三个步骤组成的统一管道:线性化、学习和解码。特别是,我们展示了如何通过在语法上进行右角转换和作出具体的独立假设,减少最先进的选区评分。此外,我们从经验上评估了我们用线性化、学习和解码的不同选择来标记管道的分类学。基于英语和一套8种类型多样的语言的结果,我们得出结论,衍生树的线性化及其与投入序列的一致,是实现准确的标记器的最关键因素。