Causal relationships are commonly examined in manufacturing processes to support faults investigations, perform interventions, and make strategic decisions. Industry 4.0 has made available an increasing amount of data that enable data-driven Causal Discovery (CD). Considering the growing number of recently proposed CD methods, it is necessary to introduce strict benchmarking procedures on publicly available datasets since they represent the foundation for a fair comparison and validation of different methods. This work introduces two novel public datasets for CD in continuous manufacturing processes. The first dataset employs the well-known Tennessee Eastman simulator for fault detection and process control. The second dataset is extracted from an ultra-processed food manufacturing plant, and it includes a description of the plant, as well as multiple ground truths. These datasets are used to propose a benchmarking procedure based on different metrics and evaluated on a wide selection of CD algorithms. This work allows testing CD methods in realistic conditions enabling the selection of the most suitable method for specific target applications. The datasets are available at the following link: https://github.com/giovanniMen
翻译:由于最近提议的光盘方法越来越多,有必要对公开提供的数据集采用严格的基准程序,因为这些数据集是公平比较和验证不同方法的基础。这项工作在连续制造过程中为光盘引入了两种新的公共数据集。第一个数据集使用著名的田纳西·伊斯特曼模拟器进行故障检测和流程控制。第二个数据集是从超处理食品制造厂提取的,其中包括对工厂的描述以及多地面的真相。这些数据集用来提出基于不同指标的基准程序,并根据广泛选择的光盘算法进行评估。这项工作允许在现实条件下测试光盘方法,以便能够选择最适合特定目标应用的方法。数据集可以在以下链接上找到:https://github.com/giovanniMen。数据集可以在以下链接上找到:https://github.giovanMen。