For monitoring crises, political events are extracted from the news. The large amount of unstructured full-text event descriptions makes a case-by-case analysis unmanageable, particularly for low-resource humanitarian aid organizations. This creates a demand to classify events into event types, a task referred to as event coding. Typically, domain experts craft an event type ontology, annotators label a large dataset and technical experts develop a supervised coding system. In this work, we propose PR-ENT, a new event coding approach that is more flexible and resource-efficient, while maintaining competitive accuracy: first, we extend an event description such as "Military injured two civilians'' by a template, e.g. "People were [Z]" and prompt a pre-trained (cloze) language model to fill the slot Z. Second, we select answer candidates Z* = {"injured'', "hurt"...} by treating the event description as premise and the filled templates as hypothesis in a textual entailment task. This allows domain experts to draft the codebook directly as labeled prompts and interpretable answer candidates. This human-in-the-loop process is guided by our interactive codebook design tool. We evaluate PR-ENT in several robustness checks: perturbing the event description and prompt template, restricting the vocabulary and removing contextual information.
翻译:为了监测危机,政治事件是从新闻中摘取的。 大量未经结构化的全文事件描述使个案分析无法管理, 特别是对于资源较少的人道主义援助组织来说。 这就要求将事件分类为事件类型, 称为事件编码。 通常, 域专家会编造事件类型肿瘤, 说明员会标出大数据集, 技术专家会开发一个监管的编码系统。 在这项工作中, 我们提议将事件描述作为前提, 填充的模板, 更灵活, 资源效率更高, 同时又保持竞争性的准确性: 首先, 我们扩展事件描述, 如“ 军方伤害了两名平民 ”, 特别是对于资源较少的人道主义援助组织。 这样, 我们就可以用模板, 例如“ 人民是[ Z] ” 来进行事件分类, 并提示一个预先训练的( cloze) 语言模型来填补时间档 Z。 其次, 我们通过将事件描述作为前提, 和填充的模板, 在文本带来的任务中, 使域专家可以直接起草代码, 标为提示 和解释背景检查对象 。 我们通过交互式设计, 快速的流程, 我们通过 快速的模板来评估。