Low-rank approximation in data streams is a fundamental and significant task in computing science, machine learning and statistics. Multiple streaming algorithms have emerged over years and most of them are inspired by randomized algorithms, more specifically, sketching methods. However, many algorithms are not able to leverage information of data streams and consequently suffer from low accuracy. Existing data-driven methods improve accuracy but the training cost is expensive in practice. In this paper, from a subspace perspective, we propose a tensor-based sketching method for low-rank approximation of data streams. The proposed algorithm fully exploits the structure of data streams and obtains quasi-optimal sketching matrices by performing tensor decomposition on training data. A series of experiments are carried out and show that the proposed tensor-based method can be more accurate and much faster than the previous work.
翻译:数据流中的低排序近似值是计算科学、机器学习和统计方面一项重要的基本任务。 多流算法多年来已经出现,其中多数是随机算法,更具体地说,是草图方法。然而,许多算法无法利用数据流信息,因此也受不到低准确度的影响。现有的数据驱动方法提高了准确性,但在实践中培训费用昂贵。在本文中,从子空间角度出发,我们为数据流的低位近似提出了一种基于电压的草图方法。提议的算法充分利用了数据流的结构,通过对培训数据进行高或分解,获得了准最佳的草图矩阵。进行了一系列实验,并表明拟议的以电压为基础的方法可能比先前的工作更准确、更快。