Collaborative learning allows participants to jointly train a model without data sharing. To update the model parameters, the central server broadcasts model parameters to the clients, and the clients send updating directions such as gradients to the server. While data do not leave a client device, the communicated gradients and parameters will leak a client's privacy. Attacks that infer clients' privacy from gradients and parameters have been developed by prior work. Simple defenses such as dropout and differential privacy either fail to defend the attacks or seriously hurt test accuracy. We propose a practical defense which we call Double-Blind Collaborative Learning (DBCL). The high-level idea is to apply random matrix sketching to the parameters (aka weights) and re-generate random sketching after each iteration. DBCL prevents clients from conducting gradient-based privacy inferences which are the most effective attacks. DBCL works because from the attacker's perspective, sketching is effectively random noise that outweighs the signal. Notably, DBCL does not much increase computation and communication costs and does not hurt test accuracy at all.
翻译:合作学习可以让参与者在不共享数据的情况下联合培训模型。 为了更新模型参数, 中央服务器向客户播放模型参数, 客户会发送更新方向, 如梯度等 。 虽然数据不会留下客户设备, 传递的梯度和参数会泄露客户的隐私 。 先前的工作已经开发出从梯度和参数推断客户隐私的攻击 。 简单防御, 如辍学和差异隐私, 要么无法捍卫攻击, 要么严重伤害测试准确性 。 我们提议了一种实用防御, 我们称之为双线协作学习( DBCL ) 。 高层次的理念是随机矩阵草图( 碱重), 并在每次循环后重新生成随机草图 。 DBCL 防止客户使用基于梯度的隐私推断, 而这是最有效的攻击 。 DBCL 工作是因为从攻击者的角度来看, 草图实际上是随机噪音, 超过了信号。 值得注意的是, DBCL 不增加计算和通信的成本, 并且不会损害测试的准确性 。