Privacy-preserving distributed distribution comparison measures the distance between the distributions whose data are scattered across different agents in a distributed system and cannot be shared among the agents. In this study, we propose a novel decentralized entropic optimal transport (EOT) method, which provides a privacy-preserving and communication-efficient solution to this problem with theoretical guarantees. In particular, we design a mini-batch randomized block-coordinate descent (MRBCD) scheme to optimize the decentralized EOT distance in its dual form. The dual variables are scattered across different agents and updated locally and iteratively with limited communications among partial agents. The kernel matrix involved in the gradients of the dual variables is estimated by a distributed kernel approximation method, and each agent only needs to approximate and store a sub-kernel matrix by one-shot communication and without sharing raw data. We analyze our method's communication complexity and provide a theoretical bound for the approximation error caused by the convergence error, the approximated kernel, and the mismatch between the storage and communication protocols. Experiments on synthetic data and real-world distributed domain adaptation tasks demonstrate the effectiveness of our method.
翻译:保护隐私分布式分布式比较测量分布式分布式数据分散于分布式系统中的不同物剂中且不能在物剂之间共享的分布体之间的距离。在本研究中,我们提出一种新的分散式最佳迁移(EOT)方法,该方法以理论保障为解决这一问题提供一种保护隐私和通信效率高的解决办法。特别是,我们设计了一个微型批量随机区块坐标下游(MRBCD)计划,以优化分散式EOT的双重形式距离。双重变量分散在不同物剂中,并在局部物剂中以有限的通信方式对本地和迭代进行更新。两种变数梯度中的内核质矩阵通过分布式内核接近法估算,每种物剂只需通过一发式通信和不共享原始数据来接近和储存一个子内核矩阵。我们分析了我们的方法的通信复杂性,并为因趋同错误、近似内核内核以及储存和通信协议之间的不匹配性差提供了理论界限。关于合成数据和现实世界分布式域适应任务的实验显示了我们的方法的有效性。