What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant information, and analyzing it. We present a new approach in which large language models are utilized to generate source documents that allow predicting, given a high-level event definition, the specific events, arguments, and relations between them to construct a schema that describes the complex event in its entirety. Using our model, complete schemas on any topic can be generated on-the-fly without any manual data collection, i.e., in a zero-shot manner. Moreover, we develop efficient methods to extract pertinent information from texts and demonstrate in a series of experiments that these schemas are considered to be more complete than human-curated ones in the majority of examined scenarios. Finally, we show that this framework is comparable in performance with previous supervised schema induction methods that rely on collecting real texts while being more general and flexible without the need for a predefined ontology.
翻译:无需先前知识的即时事件架构归纳
什么是大流行爆发中涉及的事件?计划婚礼时应该采取什么措施?这些问题的答案可以通过收集许多有关感兴趣复杂事件的文档、提取相关信息并对其进行分析来找到。本文介绍了一种新的方法,利用大型语言模型生成源文档,以便根据高级事件定义预测事件、参数和它们之间的关系,以构建描述整个复杂事件的架构。使用我们的模型,可以零次采集数据即可即时生成任何主题的完整架构。此外,我们开发了从文本中提取相关信息的有效方法,并在一系列实验中证明这些架构在大多数检查情况下比人工策划的架构更完整。最后,我们展示了这个框架在性能上与以前基于收集真实文本的监督架构归纳方法相当,同时更加通用和灵活,无需预定义本体论。