学习证明:定义和实践 (Proof-of-Learning: Definitions and Practice)

Training machine learning (ML) models typically involves expensive iterative optimization. Once the model's final parameters are released, there is currently no mechanism for the entity which trained the model to prove that these parameters were indeed the result of this optimization procedure. Such a mechanism would support security of ML applications in several ways. For instance, it would simplify ownership resolution when multiple parties contest ownership of a specific model. It would also facilitate the distributed training across untrusted workers where Byzantine workers might otherwise mount a denial-of-service by returning incorrect model updates. In this paper, we remediate this problem by introducing the concept of proof-of-learning in ML. Inspired by research on both proof-of-work and verified computations, we observe how a seminal training algorithm, stochastic gradient descent, accumulates secret information due to its stochasticity. This produces a natural construction for a proof-of-learning which demonstrates that a party has expended the compute require to obtain a set of model parameters correctly. In particular, our analyses and experiments show that an adversary seeking to illegitimately manufacture a proof-of-learning needs to perform *at least* as much work than is needed for gradient descent itself. We also instantiate a concrete proof-of-learning mechanism in both of the scenarios described above. In model ownership resolution, it protects the intellectual property of models released publicly. In distributed training, it preserves availability of the training procedure. Our empirical evaluation validates that our proof-of-learning mechanism is robust to variance induced by the hardware (ML accelerators) and software stacks.

翻译：培训机器(ML)模式通常需要昂贵的迭代优化。一旦模型的最终参数发布后,培训模型的最后参数的实体目前没有机制来证明这些参数确实是优化程序的结果。这种机制可以以几种方式支持ML应用的安全性。例如,当多个当事方对特定模型的所有权提出异议时,它将简化所有权的解决方案,还将促进在不受信任的工人中进行分散培训,因为Byzantine工人可能通过返回不正确的模型更新来拒绝服务。在本文中,我们通过在ML中引入核实差异学习的概念来弥补这一问题。根据关于工作证明和核实计算的研究,我们观察一个半成份的培训算法如何因其偏差性而积累秘密信息。这为证明学习证据提供了自然的构建,表明一个当事方为了正确地取回一套模型的校准参数,我们的分析与实验表明,一个对手试图在ML.L.M.M.

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/