Process extraction from text is an important task of process discovery, for which various approaches have been developed in recent years. However, in contrast to other information extraction tasks, there is a lack of gold-standard corpora of business process descriptions that are carefully annotated with all the entities and relationships of interest. Due to this, it is currently hard to compare the results obtained by extraction approaches in an objective manner, whereas the lack of annotated texts also prevents the application of data-driven information extraction methodologies, typical of the natural language processing field. Therefore, to bridge this gap, we present the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a variety of baselines to benchmark the difficulty and challenges of business process extraction from text. PET can be accessed via huggingface.co/datasets/patriziobellan/PET
翻译:从文本中提取过程是程序发现的一项重要任务,近年来已经为此制定了各种办法,但是,与其他信息提取任务不同,目前缺乏与所有实体和利益关系关系有仔细说明的业务流程说明金标准公司,因此,目前很难客观地比较从文本中提取方法取得的结果,而缺乏附加说明的文本也妨碍采用数据驱动信息提取方法,这是自然语言处理领域典型的典型。因此,为了弥补这一差距,我们提供了PET数据集,这是第一批商业程序说明,附有活动、网关、行为者和流动信息的说明。我们提出了我们的新资源,包括各种基准,以衡量从文本中提取业务流程的困难和挑战。PET可以通过拥抱方式获得。co/dataset/patriziobellan/PET。