We give a dimensionality reduction procedure to approximate the sum of distances of a given set of $n$ points in $R^d$ to any "shape" that lies in a $k$-dimensional subspace. Here, by "shape" we mean any set of points in $R^d$. Our algorithm takes an input in the form of an $n \times d$ matrix $A$, where each row of $A$ denotes a data point, and outputs a subspace $P$ of dimension $O(k^{3}/\epsilon^6)$ such that the projections of each of the $n$ points onto the subspace $P$ and the distances of each of the points to the subspace $P$ are sufficient to obtain an $\epsilon$-approximation to the sum of distances to any arbitrary shape that lies in a $k$-dimensional subspace of $R^d$. These include important problems such as $k$-median, $k$-subspace approximation, and $(j,l)$ subspace clustering with $j \cdot l \leq k$. Dimensionality reduction reduces the data storage requirement to $(n+d)k^{3}/\epsilon^6$ from nnz$(A)$. Here nnz$(A)$ could potentially be as large as $nd$. Our algorithm runs in time nnz$(A)/\epsilon^2 + (n+d)$poly$(k/\epsilon)$, up to logarithmic factors. For dense matrices, where nnz$(A) \approx nd$, we give a faster algorithm, that runs in time $nd + (n+d)$poly$(k/\epsilon)$ up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly$(k/\epsilon)$ size coresets for $k$-median and $(k,1)$-subspace approximation problems in polynomial time.
翻译:我们给一个维度降低程序, 将一个特定数组的美元值( 美元) 的距离和任何位于一个 美元维度子空间的“ shape” 。 这里, “ shape ” 我们指任何数组的美元 美元 。 我们的算法以美元 美元 美元 矩阵 $A美元的形式进行输入, 每行美元 美元 表示一个数据点, 输出一个子空间 美元 维度 $( O( 3) 美元 /\ epsilon6 美元 的美元, 每个值美元 美元, 每个值 美元 美元 美元 美元 美元 美元 美元 。