Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.
翻译:内部威胁可能从任何领域发动攻击,损害ML资产(模型和数据集),因此,从业者需要了解如何和由谁开发ML资产,以评估其质量属性,例如安保、安全和公平;不幸的是,ML团队难以获取和重建ML资产(ML出处)的此类历史信息,因为ML资产(ML出处)一般分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分布在分散的团队中的团队,并受到袭击ML资产的对手的威胁的威胁。本文提议,ProML是一个分散的平台,利用分散的分散式链和智能合同,授权分散式平台,使分布在分布在各地的ML资产来源联合管理单一的真相来源,而不必依赖第三方,因为后者很容易受到内部威胁和公平。 我们提议采用新的结构方法,利用链式交易和智能合同管理ML来源信息,并引入用户驱动的捕获机制,将现有的脚本和工具整合到分布在传播中,而不会损害参与者的MLLML资产的标准上。