Big data analysis has become a crucial part of new emerging technologies such as the internet of things, cyber-physical analysis, deep learning, anomaly detection, etc. Among many other techniques, dimensionality reduction plays a key role in such analyses and facilitates feature selection and feature extraction. Randomized algorithms are efficient tools for handling big data tensors. They accelerate decomposing large-scale data tensors by reducing the computational complexity of deterministic algorithms and the communication among different levels of memory hierarchy, which is the main bottleneck in modern computing environments and architectures. In this paper, we review recent advances in randomization for computation of Tucker decomposition and Higher Order SVD (HOSVD). We discuss random projection and sampling approaches, single-pass and multi-pass randomized algorithms and how to utilize them in the computation of the Tucker decomposition and the HOSVD. Simulations on synthetic and real datasets are provided to compare the performance of some of best and most promising algorithms.
翻译:大型数据分析已成为诸如物的互联网、网络物理分析、深学习、异常现象探测等新兴技术的关键部分。 在许多其他技术中,减少维度在这类分析中发挥着关键作用,有利于地貌选择和地貌提取。随机算法是处理大数据指数的有效工具。它们通过减少确定性算法的计算复杂性和不同级别记忆等级之间的交流加速了大规模数据分解过程,这是现代计算环境和结构中的主要瓶颈。在本文中,我们审查了计算塔克分解和高级命令SVD(HOSVD)的随机化的最新进展。我们讨论了随机预测和抽样方法、单方和多方位随机化算法以及如何在计算塔克分解法和HOSVD时使用这些数据。提供了合成和真实数据集的模拟,以比较一些最佳和最有希望的算法的性。