In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.
翻译:在词汇语义学中,尽管各种现象是相互依存的,但通常对各种现象的全言分解和区段标签进行单独处理。我们假设,统一的词汇语义识别任务是包装以前不同的批注风格的有效方法,包括多字表达识别/分类和超感性标记。我们利用STREUSLE系统,培训神经通用报告格式序列图解器,并评估其沿各种批注轴线的性能。随着标签对以往任务(PARSEME、DIMSUM)的概括性作了概括性概括,我们进一步评估了模型对这些测试集的概括性有多好,发现它接近或超越了现有的模型,尽管只是就STREUSLE进行了培训。我们的工作还为综合和准确的词汇语义建模工作建立了基线模型和评估指标,为今后在这一领域的工作提供了便利。