Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning. Without any probabilistic assumptions on the underlying data, we study the problem of distributed mean estimation where the server has access to side information. We propose \emph{Wyner-Ziv estimators}, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known. As a corollary, we also show that our algorithms provide efficient schemes for the classic Wyner-Ziv problem in information theory. In a different direction, when there is no knowledge assumed about the distance between side information and the data, we present an alternative Wyner-Ziv estimator that uses correlated sampling. This latter setting offers {\em universal recovery guarantees}, and perhaps will be of interest in practice when the number of users is large and keeping track of the distances between the data and the side information may not be possible.
翻译:通信效率分布平均估计是一个重要的原始数据,产生于许多分布式的学习和优化情景,例如联合学习。在基础数据上没有任何概率假设的情况下,我们研究的是服务器能够获取侧面信息的分布平均估计问题。我们提议了\ emph{Wyner-Ziv spestomator},这是通信和计算效率高且接近最佳的,如果知道侧面信息与数据之间的距离。作为必然结果,我们还表明我们的算法为信息理论中经典的Wyner-Ziv问题提供了有效的计划。在不同的方向上,如果对侧面信息与数据之间的距离没有了解,我们提出使用相关抽样的替代Wyner-Ziv 估计数据。后一种设定提供 ~ 超乎普遍回收保证}, 并且当用户数量大且无法跟踪数据与侧面信息之间的距离时,也许对实践感兴趣。