SVD serves as an exploratory tool in identifying the dominant features in the form of top rank-r singular factors corresponding to the largest singular values. For Big Data applications it is well known that Singular Value Decomposition (SVD) is restrictive due to main memory requirements. However, a number of applications such as community detection, clustering, or bottleneck identification in large scale graph data-sets rely upon identifying the lowest singular values and the singular corresponding vectors. For example, the lowest singular values of a graph Laplacian reveal the number of isolated clusters (zero singular values) or bottlenecks (lowest non-zero singular values) for undirected, acyclic graphs. A naive approach here would be to perform a full SVD however, this quickly becomes infeasible for practical big data applications due to the enormous memory requirements. Furthermore, for such applications only a few lowest singular factors are desired making a full decomposition computationally exorbitant. In this work, we trivially extend the previously proposed Range-Net to \textbf{Tail-Net} for a memory and compute efficient extraction of lowest singular factors of a given big dataset and a specified rank-r. We present a number of numerical experiments on both synthetic and practical data-sets for verification and bench-marking using conventional SVD as the baseline.
翻译:SVD是一个探索性工具,用来确定与最大单值相对应的最上等单值奇数的顶级奇数中的主导特征。对于大数据应用,众所周知,由于主要记忆要求,单值分解(SVD)具有限制性;然而,在大比例图表数据集中,社区检测、集聚或瓶颈识别等一些应用,依赖的是确定最低单值和单等对应矢量。例如,一个拉普拉钱图的最小单值显示非方向、周期性图表的孤立组群(零单值)或瓶颈(最低非零值)的数量(最低非零值)。这里的一种天真的方法是进行全面的SVD,然而,由于巨大的记忆要求,这对于实际的大数据应用很快就不可行。此外,对于这些应用来说,只有为数不多的最小的奇数才能使完全分解的矢量过高。在这项工作中,我们微不足道地将原先提议的范围网值扩大到\ textbf{Tail-Net},用于记忆和精确地提取当前实际数据基数的最小的基数,用于我们规定的常规数据和定基数的基数的基数。