区域网:用于大数据应用的高精度流SVD (Range-Net: A High Precision Streaming SVD for Big Data Applications)

In a Big Data setting computing the dominant SVD factors is restrictive due to the main memory requirements. Recently introduced streaming Randomized SVD schemes work under the restrictive assumption that the singular value spectrum of the data has exponential decay. This is seldom true for any practical data. Although these methods are claimed to be applicable to scientific computations due to associated tail-energy error bounds, the approximation errors in the singular vectors and values are high when the aforementioned assumption does not hold. Furthermore from a practical perspective, oversampling can still be memory intensive or worse can exceed the feature dimension of the data. To address these issues, we present Range-Net as an alternative to randomized SVD that satisfies the tail-energy lower bound given by Eckart-Young-Mirsky (EYM) theorem. Range-Net is a deterministic two-stage neural optimization approach with random initialization, where the main memory requirement depends explicitly on the feature dimension and desired rank, independent of the sample dimension. The data samples are read in a streaming setting with the network minimization problem converging to the desired rank-r approximation. Range-Net is fully interpretable where all the network outputs and weights have a specific meaning. We provide theoretical guarantees that Range-Net extracted SVD factors satisfy EYM tail-energy lower bound at machine precision. Our numerical experiments on real data at various scales confirms this bound. A comparison against the state of the art streaming Randomized SVD shows that Range-Net accuracy is better by six orders of magnitude while being memory efficient.

翻译：在计算支配性SVD因素的大数据设置中,由于主要记忆要求,在计算支配性SVD因素时限制因素。最近引入的流式随机SVD计划在数据单值范围有指数衰减这一限制性假设下运作。对于任何实用数据来说,这些方法很少如此。虽然由于相关的尾能误差界限,这些方法据称适用于科学计算,但单向矢量和值的近似差值在上述假设不起作用时是很高的。此外,从实际的角度来看,过量抽样可能仍然是记忆密集或更差的,可能超过数据的特点。为了解决这些问题,我们提出了SVD随机的SVD计划,作为满足Eckart-Young-Mirsky(EYM)理论设定的低尾能约束范围的SVD的替代。LandreetNet是一个具有确定性的两个阶段的神经优化方法,主要记忆要求明确取决于特性和期望的级别,独立于样本的层面。数据样本是在网络内流问题最小化到理想的级-rbir 近近的SVD准确性SNet。我们完全可以解读的S-ralalalalalalalalalimalexexeximalexeximaleximal 。我们能在S-deal 提供所有特定的精确的精确的S-dealdealeximmaleximmalalal 。