The difficulty in acquiring a sufficient amount of training data is a major bottleneck for machine learning (ML) based data analytics. Recently, commoditizing ML models has been proposed as an economical and moderate solution to ML-oriented data acquisition. However, existing model marketplaces assume that the broker can access data owners' private training data, which may not be realistic in practice. In this paper, to promote trustworthy data acquisition for ML tasks, we propose FL-Market, a locally private model marketplace that protects privacy not only against model buyers but also against the untrusted broker. FL-Market decouples ML from the need to centrally gather training data on the broker's side using federated learning, an emerging privacy-preserving ML paradigm in which data owners collaboratively train an ML model by uploading local gradients (to be aggregated into a global gradient for model updating). Then, FL-Market enables data owners to locally perturb their gradients by local differential privacy and thus further prevents privacy risks. To drive FL-Market, we propose a deep learning-empowered auction mechanism for intelligently deciding the local gradients' perturbation levels and an optimal aggregation mechanism for aggregating the perturbed gradients. Our auction and aggregation mechanisms can jointly maximize the global gradient's accuracy, which optimizes model buyers' utility. Our experiments verify the effectiveness of the proposed mechanisms.
翻译:获得足够的训练数据是机器学习(ML)数据分析的主要瓶颈。最近,将ML模型商品化被提出作为一种经济实惠的解决方案。然而,现有的模型市场假定经纪人可以访问数据所有者的私有训练数据,这在实践中可能不现实。为了促进ML任务的可信数据采集,我们提出FL-Market,这是一个本地私有模型市场,不仅可以保护模型买家的隐私,还可以防止经纪人的隐私风险。FL-Market通过使用联邦学习来将ML与在经纪人一侧集中收集训练数据的需求分离开来,联邦学习是一种新兴的隐私保护ML范式,其中数据所有者通过上传本地梯度(以便聚合为用于模型更新的全局梯度)来协作地训练ML模型。然后,FL-Market运用本地差分隐私对数据所有者的梯度进行本地扰动,从而进一步防止隐私风险。为了推动FL-Market,我们提出了一个深度学习增强的拍卖机制,以智能决定本地梯度的扰动级别和最优聚合机制以聚合扰动的梯度。我们的拍卖和聚合机制可以共同最大化全局梯度的准确性,从而优化模型买家的效用。实验验证了我们提出的机制的有效性。