Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool, $\mathsf{FriendlyCore}$, that takes a set of points ${\cal D}$ from an unrestricted (pseudo) metric space as input. When ${\cal D}$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a "stable" subset ${\cal C} \subseteq {\cal D}$ that includes all points, except possibly few outliers, and is {\em certified} to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods.
翻译:用于通用集成任务(如集成或平均)的不同私人算法通常因复杂性或准确结果所需要的大量数据点而具有有限的实际性。 我们提出了一个简单实用的工具, $\ mathsf{ff}FriendlyCore}$, 它从一个不受限制( 假的) 公用空间中取出一组点 $_cal D} 作为输入。 当 $_cal D} 具有有效的直径, $\ mathsf{FriendlyCore} 美元返回一个“ 稳定” 子子数 $_ c}\ subseqeq {cal D} $, 它包括所有点, 可能只有很少的外端点, 并被认证为直径。 $\ mathsf{f{ friendlyCore} $ 可以用来在私人集成之前预先处理输入, 有可能简化集成或提高它的准确性。 令人惊讶的是, $\fathffrelyCre} 美元是轻量的, 它在不依赖维度上是轻重量的。 我们实验性地展示了它的精确性地展示了GMMMM的优势, 。