Although robust learning and local differential privacy are both widely studied fields of research, combining the two settings is an almost unexplored topic. We consider the problem of estimating a discrete distribution in total variation from $n$ contaminated data batches under a local differential privacy constraint. A fraction $1-\epsilon$ of the batches contain $k$ i.i.d. samples drawn from a discrete distribution $p$ over $d$ elements. To protect the users' privacy, each of the samples is privatized using an $\alpha$-locally differentially private mechanism. The remaining $\epsilon n $ batches are an adversarial contamination. The minimax rate of estimation under contamination alone, with no privacy, is known to be $\epsilon/\sqrt{k}+\sqrt{d/kn}$, up to a $\sqrt{\log(1/\epsilon)}$ factor. Under the privacy constraint alone, the minimax rate of estimation is $\sqrt{d^2/\alpha^2 kn}$. We show that combining the two constraints leads to a minimax estimation rate of $\epsilon\sqrt{d/\alpha^2 k}+\sqrt{d^2/\alpha^2 kn}$ up to a $\sqrt{\log(1/\epsilon)}$ factor, larger than the sum of the two separate rates. We provide a polynomial-time algorithm achieving this bound, as well as a matching information theoretic lower bound.
翻译:虽然强健的学习和本地差异隐私都是广泛研究的研究领域,但将两种设置结合起来几乎是一个尚未探讨的话题。 我们考虑在本地有差异的隐私限制下,从受污染的数据批量中估计离散的分布总量与受污染的美元数据批量之间总差异的问题。 批量中的1美元- epsilon$包含美元i. id。 从离散分配中提取的样本超过美元元素的美元。 为保护用户隐私, 每一个样本都使用美元/ alpha$- 地方差异化的私人机制私有化。 剩下的美元/ epsilon n 批量是一个更大的对抗性污染。 光是污染下的微缩估算率, 没有隐私, 已知为 $\ qrqrus\ krq_ rqration= krqration$, 最高为美元/ krq_ rqral_ ral_ raltial_ ral_ kral_ ral_ ral_ kral_ ral_ ral_ kral_ $_ ral_ ral_ rass_ ral_ ral_ ral_ ral_ kral_ ral_ ral_ rum_ rum_ rum_ krxxxxxxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx