Automatic Process Discovery aims at developing algorithmic methodologies for the extraction and elicitation of process models as described in data. While Process Discovery from event-log data is a well established area, that has already moved from research to concrete adoption in a mature manner, Process Discovery from text is still a research area at an early stage of development, which rarely scales to real world documents. In this paper we analyze, in a comparative manner, reference state-of-the-art literature, especially for what concerns the techniques used, the process elements extracted and the evaluations performed. As a result of the analysis we discuss important limitations that hamper the exploitation of recent Natural Language Processing techniques in this field and we discuss fundamental limitations and challenges for the future concerning the datasets, the techniques, the experimental evaluations, and the pipelines currently adopted and to be developed in the future.
翻译:自动过程发现旨在为数据中描述的过程模型的提取和提取制定算法方法。虽然从事件数据中发现过程是一个已经确立的领域,已经从研究转向以成熟的方式具体采用,但从文本中发现过程仍是一个处于早期开发阶段的研究领域,很少将其规模扩大到真实世界文件。在本文件中,我们以比较方式分析参考最新文献,尤其是关于所使用的技术、提取的过程要素和所进行的评价。由于进行了分析,我们讨论了阻碍在这一领域利用最新自然语言处理技术的重要限制,我们讨论了在数据集、技术、实验评估以及目前采用和今后将要开发的管道方面对未来的基本限制和挑战。