Big data analysis has become a crucial part of new emerging technologies such as the internet of things, cyber-physical analysis, deep learning, anomaly detection, etc. Among many other techniques, dimensionality reduction plays a key role in such analyses and facilitates feature selection and feature extraction. Randomized algorithms are efficient tools for handling big data tensors. They accelerate decomposing large-scale data tensors by reducing the computational complexity of deterministic algorithms and the communication among different levels of the memory hierarchy, which is the main bottleneck in modern computing environments and architectures. In this paper, we review recent advances in randomization for the computation of Tucker decomposition and Higher Order SVD (HOSVD). We discuss random projection and sampling approaches, single-pass, and multi-pass randomized algorithms, and how to utilize them in the computation of the Tucker decomposition and the HOSVD. Simulations on synthetic and real datasets are provided to compare the performance of some of the best and most promising algorithms.
翻译:大数据分析已成为诸如物的互联网、网络物理分析、深学习、异常现象探测等新兴技术的关键部分。 在许多其他技术中,减少维度在这类分析中发挥着关键作用,有利于地貌选择和特征提取。随机算法是处理大数据压强的有效工具。它们通过减少确定性算法的计算复杂性和记忆层不同层次之间的交流加速了大规模数据分解过程,而记忆层是现代计算环境和结构中的主要瓶颈。在本文中,我们审查了计算塔克分解和高级命令SVD(HOSVD)的随机化的最新进展。我们讨论了随机投影和抽样方法、单轴和多轴随机算法,以及如何在计算塔克分解和HOSVD时使用这些数据。提供了合成和真实数据集的模拟,以比较一些最佳和最有希望的算法的性。