关于协作机制学习示范培训的智能合同调查 (An Investigation of Smart Contract for Collaborative Machine Learning Model Training)

Machine learning (ML) has penetrated various fields in the era of big data. The advantage of collaborative machine learning (CML) over most conventional ML lies in the joint effort of decentralized nodes or agents that results in better model performance and generalization. As the training of ML models requires a massive amount of good quality data, it is necessary to eliminate concerns about data privacy and ensure high-quality data. To solve this problem, we cast our eyes on the integration of CML and smart contracts. Based on blockchain, smart contracts enable automatic execution of data preserving and validation, as well as the continuity of CML model training. In our simulation experiments, we define incentive mechanisms on the smart contract, investigate the important factors such as the number of features in the dataset (num_words), the size of the training data, the cost for the data holders to submit data, etc., and conclude how these factors impact the performance metrics of the model: the accuracy of the trained model, the gap between the accuracies of the model before and after simulation, and the time to use up the balance of bad agent. For instance, the increase of the value of num_words leads to higher model accuracy and eliminates the negative influence of malicious agents in a shorter time from our observation of the experiment results. Statistical analyses show that with the help of smart contracts, the influence of invalid data is efficiently diminished and model robustness is maintained. We also discuss the gap in existing research and put forward possible future directions for further works.

翻译：合作机器学习(CML)的优势在于分散式节点或代理人的共同努力,这些节点或代理人的共同努力导致更好的示范性业绩和一般化。由于培训ML模型需要大量高质量的数据,因此有必要消除对数据隐私的关切,确保高质量的数据。为了解决这一问题,我们关注CML和智能合同的整合。基于链锁,智能合同使得数据保存和验证以及CML模型培训的连续性能够自动执行。在我们模拟实验中,我们界定智能合同的激励机制,调查诸如数据集特征数量(Num_words)、培训数据的规模、数据持有者提交数据的成本等重要因素。为了解决这一问题,我们关注CML和智能合同的整合问题。我们关注CML合同的整合。基于链、智能合同使得数据保存和验证的自动执行以及CML模型培训的连续性。我们界定智能合同的激励机制,调查诸如数据集(Num_words)的特性数量、培训数据持有者提交数据的成本等等等重要因素,并总结这些因素如何影响模型的准确性:模型在模拟之前和模拟后,进一步缩小模型之间的差距,以及利用高级代理人的平衡的时间。举例,我们更精确的观察结果的准确性分析的准确性分析的准确性分析的准确性分析也显示了我们现有价值。