We propose OmniLytics, a blockchain-based secure data trading marketplace for machine learning applications. Utilizing OmniLytics, many distributed data owners can contribute their private data to collectively train a ML model requested by some model owners, and get compensated for data contribution. OmniLytics enables such model training while simultaneously providing 1) model security against curious data owners; 2) data security against curious model and data owners; 3) resilience to malicious data owners who provide faulty results to poison model training; and 4) resilience to malicious model owner who intents to evade the payment. OmniLytics is implemented as a smart contract on the Ethereum blockchain to guarantee the atomicity of payment. In OmniLytics, a model owner publishes encrypted initial model on the contract, over which the participating data owners compute gradients using their private data, and securely aggregate the gradients through the contract. Finally, the contract reimburses the data owners, and the model owner decrypts the aggregated model update. We implement a working prototype of OmniLytics on Ethereum, and perform extensive experiments to measure its gas cost and execution time under various parameter combinations, demonstrating its high computation and cost efficiency and strong practicality.
翻译:我们提议OmniLytics,这是一个基于安全链的安全数据交易市场,用于机器学习应用。利用OmniLytics,许多分布式数据所有者可以贡献其私人数据,以集体培训一些模型所有者所要求的ML模型,并获得数据贡献补偿。OmniLytics使得这种示范培训既能同时提供针对好奇数据拥有者的示范安全;(2)数据安全,以对抗好奇模型和数据拥有者;(3)对恶意数据拥有者提供毒害模型培训错误结果的抵御能力;(4)对恶意模型拥有者具有回避付款的抵御能力。OmniLytics是作为Etheenum链上的智能合同实施的,以保障付款的原子性。在OmniLytics中,一个模型拥有者公布了关于合同的加密初步模型,参与数据拥有者利用他们的私人数据计算梯度,并通过合同将梯度安全地综合到合同中。最后,合同拥有者偿还数据拥有者对综合模型更新的错误结果;和模型拥有者解析。我们在EieummniLytics上采用一个工作原型模型,作为保证付款的智能,并进行强有力的实验,在各种参数组合下测量中测量其气体成本和实际计算中,并进行严格的计算。