In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$-median and $k$-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension $d$, the precision parameter $\varepsilon^{-1}$ or $k$. Furthermore, there is no coreset construction that succeeds with probability $1-1/n$ and whose size does not depend on the number of input points, $n$. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman, WIREs Data Mining Knowl. Discov'20]. Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are $\Omega(1)$ for both $k$-median and $k$-means, even when allowing a complexity FPT in the number of clusters $k$. This stands in sharp contrast with the $(1+\varepsilon)$-approximation achievable in that case, when allowing randomization. In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We also construct a deterministic algorithm for computing $(1+\varepsilon)$-approximation to $k$-median and $k$-means in high dimensional Euclidean spaces in time $2^{k^2/\varepsilon^{O(1)}} poly(nd)$, close to the best randomized complexity. Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS '22] by a factor $k$.
翻译:暂无翻译