Simile recognition involves two subtasks: simile sentence classification that discriminates whether a sentence contains simile, and simile component extraction that locates the corresponding objects (i.e., tenors and vehicles). Recent work ignores features other than surface strings. In this paper, we explore expressive features for this task to achieve more effective data utilization. Particularly, we study two types of features: 1) input-side features that include POS tags, dependency trees and word definitions, and 2) decoding features that capture the interdependence among various decoding decisions. We further construct a model named HGSR, which merges the input-side features as a heterogeneous graph and leverages decoding features via distillation. Experiments show that HGSR significantly outperforms the current state-of-the-art systems and carefully designed baselines, verifying the effectiveness of introduced features. Our code is available at https://github.com/DeepLearnXMU/HGSR.
翻译:Simile 识别涉及两个子任务: 硅句分类,该分类区分某一句是否包含硅, 和硅元素提取, 从而确定相应的对象( 即, 高压和车辆) 。 最近的工作忽略了表层字符串以外的其他特征 。 在本文件中, 我们探索了这项任务的表达特征, 以便更有效地利用数据 。 特别是, 我们研究两种特征:(1) 包括 POS 标签、 依赖性树和单词定义在内的输入方特征, 以及(2) 解码功能, 反映各种解码决定之间的相互依存性 。 我们进一步构建了一个名为 HGSR 的模型, 将输入方特征合并成一个混杂的图形和通过蒸馏解码的杠杆特征 。 实验显示, HGSR 明显超越了当前最先进的系统和精心设计的基线, 验证了引入的特征的有效性 。 我们的代码可在 https://github.com/ DeepLearXMU/ HGSR 上查阅 。