Provenance management must be present to enhance the overall security and reliability of long-tail microscopy (LTM) data management systems. However, there are challenges in provenance for domains with LTM data. The provenance data need to be collected more frequently, which increases system overheads (in terms of computation and storage) and results in scalability issues. Moreover, in most scientific application domains a provenance solution must consider network-related events as well. Therefore, provenance data in LTM data management systems are highly diverse and must be organized and processed carefully. In this paper, we introduce a novel provenance service, called ProvLet, to collect, distribute, analyze, and visualize provenance data in LTM data management systems. This means (1) we address how to filter and store the desired transactions on disk; (2) we consider a data organization model at higher level data abstractions, suitable for step-by-step scientific experiments, such as datasets and collections, and develop provenance algorithms over these data abstractions, rather than solutions considering low-level abstractions such as files and folders. (3) We utilize ProvLet's log files and visualize provenance information for further forensics explorations. The validation of ProvLet with actual long tail microscopy data, collected over a period of six years, shows a provenance service that yields a low system overhead and enables scalability.
翻译:证明管理必须到位,以加强长尾显微镜(LTM)数据管理系统的总体安全和可靠性。然而,LTM数据数据领域在出处方面存在挑战。出处数据需要更经常地收集,这增加了系统管理费用(计算和储存方面),并导致可缩放问题。此外,在大多数科学应用领域,出处解决方案必须考虑到网络相关事件。因此,LTM数据管理系统的出处数据非常多样,必须精心组织和处理。在本文中,我们引入了一个新的出处服务,称为Provletter,以收集、分发、分析和可视化LTM数据管理系统中的出处数据。这意味着:(1) 我们处理如何在磁盘上过滤和储存所期望的交易;(2) 我们考虑在较高层次的数据结构模型,适合逐步进行科学实验,如数据集和收集,并针对这些低度数据抽象制定证明算法,而不是在档案和文件夹等低度抽取数据方面制定解决办法。(3) 我们利用Provlete的系统记录档案和图像可视化性数据,从而能够对六年的可视性进行长期的微生物验证。