Most dimensionality reduction methods employ frequency domain representations obtained from matrix diagonalization and may not be efficient for large datasets with relatively high intrinsic dimensions. To address this challenge, Correlated Clustering and Projection (CCP) offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correlated clusters and then projects correlated features in each cluster into a one-dimensional representation based on sample correlations. Residue-Similarity (R-S) scores and indexes, the shape of data in Riemannian manifolds, and algebraic topology-based persistent Laplacian are introduced for visualization and analysis. Proposed methods are validated with benchmark datasets associated with various machine learning algorithms.
翻译:多数维度减少方法采用从矩阵二进制中获得的频域表示法,对于具有较高内在层面的大型数据集来说可能效率不高。为了应对这一挑战,相关集群和投影(CCP)提供了一个新的数据域战略,不需要解决任何矩阵。CCP将高维特征分割成相关集群,然后根据抽样相关性将每个集群的相联特征预测成一维表示法。残余-硅度(R-S)分数和指数、里曼元体中的数据形状和基于代数的表层表层持续拉普拉西亚数据被引入可视化和分析。提议的方法经过与各种机器学习算法相关的基准数据集验证。