We present an empirical study on methods for span finding, the selection of consecutive tokens in text for some downstream tasks. We focus on approaches that can be employed in training end-to-end information extraction systems, and find there is no definitive solution without considering task properties, and provide our observations to help with future design choices: 1) a tagging approach often yields higher precision while span enumeration and boundary prediction provide higher recall; 2) span type information can benefit a boundary prediction approach; 3) additional contextualization does not help span finding in most cases.
翻译:我们介绍了一项经验性研究,探讨在案文中为一些下游任务选择连续标牌的方法;我们侧重于在培训端对端信息提取系统时可以采用的方法,发现在不考虑任务特性的情况下没有最终的解决办法;我们提出我们的意见,以帮助今后的设计选择:(1) 标记方法往往具有更高的精确度,而跨查点和边界预测则提供更高的回顾;(2) 跨类型信息有利于边界预测方法;(3) 额外的背景化在多数情况下无助于查找。