We consider the problem of constructing small coresets for $k$-Median in Euclidean spaces. Given a large set of data points $P\subset \mathbb{R}^d$, a coreset is a much smaller set $S\subset \mathbb{R}^d$, so that the $k$-Median costs of any $k$ centers w.r.t. $P$ and $S$ are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small $d$ is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for $k$-Median in small dimensions. For small $d$, a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small $d$. In particular, we completely settle the coreset size bound for $1$-d $k$-Median (up to log factors). Interestingly, our results imply a strong separation between $1$-d $1$-Median and $1$-d $2$-Median. As far as we know, this is the first such separation between $k=1$ and $k=2$ in any dimension.
翻译:我们考虑在欧几里德空间为美元-麦地安元建造小型核心单位的问题。 鉴于大量数据点($P\subset\mathb{R<unk> d$),一个核心单位是一个小得多的集合单位($S\subset\mathb{R<unk> d$),因此任何1美元-麦地安元中心的美元-麦地安元成本都是接近的。现有文献主要侧重于高维值-美元-麦地安值案例,在获得维度独立的界限方面取得了巨大成功,而小额美元的情况则基本上没有被挖掘。考虑到欧几里德组群集算法的许多应用是小维度的,而当前文献中缺乏系统的研究,本文调查了小维度1美元-麦地安值-麦地安值。对于小维度-维特的界限能否大大改进,我们为一系列参数提供了肯定的答案。 此外,新的低维度结果也证明,这种低维度值是最低的一美元-一美元-我们最接近的一美元-一美元-一美元。</s>