K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in $\ell_p$-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in $\ell_p$-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied max k-coverage problem on set systems is hard to approximate to a factor greater than 1-1/e, even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in $\ell_p$-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: $\bullet$ Discrete case: To a factor of 3.94 in the $\ell_1$-metric and to a factor of 1.73 in the $\ell_2$-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under UGC. $\bullet$ Continuous case: To a factor of 2.10 in the $\ell_1$-metric and to a factor of 1.36 in the $\ell_2$-metric; this improves upon the previous factor of 1.07 in the $\ell_2$-metric obtained under UGC. We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP '05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS '19) to (nearly) the same inapproximability factors but now under the standard NP$\neq$P assumption (instead of UGC).
翻译:K- 中间值和 k- 中间值是组群算法的两个最受欢迎的目标。 尽管做了大量的努力,但很好地理解这些目标的近似性,特别是在$\ ell_ p美元测量值方面,仍然是一个重大的未决问题。 在本文中,我们大大改进文献中为这些目标所认识的近似系数在$\ ell_ p美元测量值中的难度。 我们引入了一个新的假设,称为 Johnson 覆盖率(JCH),它大致地声称,在设定系统中,经过充分研究的最大成本(k- 美元)问题很难接近于1-1/ e 美元,即使设定系统的会籍图是约翰逊图的子图。我们随后表明,与Cohen- Addad和Karthik(FOCS'19) 引入的嵌入技术的概括性相比,JCH意味着K- 中间值和k- 平价的近似性结果( $\ ell_ p- p- max) 显示, 在一般计量中,目前为1- 美元 美元 的比值( JCH) 的比值( 美元) 平方表示,现在的推算为1- 基) 的推算为1- 的推算为1- 和比值的推算为1. 的比值。