In this paper, we present a linear-time approximation scheme for $k$-means clustering of \emph{incomplete} data points in $d$-dimensional Euclidean space. An \emph{incomplete} data point with $\Delta>0$ unspecified entries is represented as an axis-parallel affine subspaces of dimension $\Delta$. The distance between two incomplete data points is defined as the Euclidean distance between two closest points in the axis-parallel affine subspaces corresponding to the data points. We present an algorithm for $k$-means clustering of axis-parallel affine subspaces of dimension $\Delta$ that yields an $(1+\epsilon)$-approximate solution in $O(nd)$ time. The constants hidden behind $O(\cdot)$ depend only on $\Delta, \epsilon$ and $k$. This improves the $O(n^2 d)$-time algorithm by Eiben et al.[SODA'21] by a factor of $n$.
翻译:在本文中,我们为 emph{infrepty} 数据点组合提供了一种直线时间近似方案, 单位为 emme- euclidean 空间。 以 $\ Delta> 0 美元表示的 emph{infript} 数据点组合值为 $k$ 的线性时间近似方案。 以 $= Delta > 0 美元表示的未指明的条目数据点为 。 两个不完整的数据点之间的距离被定义为 轴- 平方圆子空间中两个最接近点之间与数据点相对的 ELLuclide 距离 。 我们为 以 $\ =2 d 美元表示的轴- parallele 亚空间中值组合值为 $\ Delta$ 的算出 $(1 ⁇ epelon) $- papposi 溶度为 $(n) $. S'n a asben a.