Previous CCG supertaggers usually predict categories using multi-class classification. Despite their simplicity, internal structures of categories are usually ignored. The rich semantics inside these structures may help us to better handle relations among categories and bring more robustness into existing supertaggers. In this work, we propose to generate categories rather than classify them: each category is decomposed into a sequence of smaller atomic tags, and the tagger aims to generate the correct sequence. We show that with this finer view on categories, annotations of different categories could be shared and interactions with sentence contexts could be enhanced. The proposed category generator is able to achieve state-of-the-art tagging (95.5% accuracy) and parsing (89.8% labeled F1) performances on the standard CCGBank. Furthermore, its performances on infrequent (even unseen) categories, out-of-domain texts and low resource language give promising results on introducing generation models to the general CCG analyses.
翻译:先前的CCG超级屠宰者通常使用多级分类来预测类别。 尽管它们简单, 内部分类结构通常被忽略。 这些结构中丰富的语义可能帮助我们更好地处理类别间的关系, 并将更稳健性带给现有的超级屠宰者。 在这项工作中, 我们提议生成类别, 而不是将其分类: 每个类别被分解成一个较小的原子标签序列, 挂图的目的是生成正确的序列 。 我们显示, 通过对类别进行更细化的观点, 不同类别的说明可以共享, 并且可以增强与判刑环境的相互作用 。 拟议的类别生成者能够在标准 CCGBank 中实现最先进的标记( 95.5% 精确度) 和 解析(89.8% 标签为 F1 ) 。 此外, 其非常规( 看不见的) 类别、 外部文本和低资源语言的性能在将生成模型引入 CCG 常规分析 方面带来了大有希望的结果 。