Conventional closed-world information extraction (IE) approaches rely on human ontologies to define the scope for extraction. As a result, such approaches fall short when applied to new domains. This calls for systems that can automatically infer new types from given corpora, a task which we refer to as type discovery. To tackle this problem, we introduce the idea of type abstraction, where the model is prompted to generalize and name the type. Then we use the similarity between inferred names to induce clusters. Observing that this abstraction-based representation is often complementary to the entity/trigger token representation, we set up these two representations as two views and design our model as a co-training framework. Our experiments on multiple relation extraction and event extraction datasets consistently show the advantage of our type abstraction approach. Code available at https://github.com/raspberryice/type-discovery-abs.
翻译:常规封闭世界信息提取(IE)方法依靠人类本源来确定提取范围。 因此,在应用到新领域时,这些方法并不尽如人意。 这就要求建立能够自动从特定公司(我们称之为类型发现)推断新类型的系统。 为了解决这一问题,我们引入了类型抽象的概念, 模型用来概括和命名该类型。 然后我们用推论名称之间的相似性来诱导集群。 观察这种抽象代表形式往往与实体/触发象征性代表形式相辅相成, 我们设置了这两种代表形式作为两种观点,并将我们的模型设计为共同培训框架。 我们关于多个关系提取和事件提取数据集的实验始终展示了我们类型抽象方法的优势。 代码可在 https://github.com/raspberryice/ty-discovery-abs查阅。