In modern machine learning, users often have to collaborate to learn distributions that generate the data. Communication can be a significant bottleneck. Prior work has studied homogeneous users -- i.e., whose data follow the same discrete distribution -- and has provided optimal communication-efficient methods. However, these methods rely heavily on homogeneity, and are less applicable in the common case when users' discrete distributions are heterogeneous. Here we consider a natural and tractable model of heterogeneity, where users' discrete distributions only vary sparsely, on a small number of entries. We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution; relying on methods from robust statistics. Then, the learned central distribution is fine-tuned to estimate the individual distributions of users. We show that SHIFT is minimax optimal in our model of heterogeneity and under communication constraints. Further, we provide experimental results using both synthetic data and $n$-gram frequency estimation in the text domain, which corroborate its efficiency.
翻译:在现代机器学习中,用户往往必须合作学习生成数据的分布方法。通信可能是一个很大的瓶颈。先前的工作已经研究了同质用户 -- -- 即其数据遵循相同的离散分布方法 -- -- 并提供了最佳的通信效率方法。然而,这些方法严重依赖同质性,在用户的离散分布各异的情况下,这些方法不太适用于常见的情况。在这里,我们考虑的是一种自然和可移动的异质性模型,用户的离散分布在少数条目上差别不大。我们建议一种名为SHIFT的新颖的两阶段方法:首先,用户通过与服务器联系学习中央分布方法;然后,学习的中央分配方法经过微调,以估计用户的个体分布。我们表明,SHIFT在我们的异性模型中和通信受限制的情况下是最小型的。此外,我们提供实验结果时使用的是合成数据和文本域的美元-克频率估计,以证实其效率。