Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically explore each point in the feature space without repetition, and a Directed Hamilton Path will be found. Based on the change point analysis theory, The sequence corresponding to the path is cut into several fragments to achieve clustering. The experiments on real-world datasets from different fields with dimensions ranging from 4 to 20531 confirm that the method in this work is robust and has visual interpretability in result analysis.
翻译:在没有先验知识的情况下确定高维数据模式是数据科学的一项重要任务。本文件提出了一个简单而高效的参数算法:数据转换为序列分析,DCSA, 数据转换为序列分析, 数据转换为序列分析, 数据转换为序列分析, 数据转换为序列分析, 将动态探索地物空间中的每个点而不重复, 并将找到一条定向的汉密尔顿路径。 根据变化点分析理论, 路径的序列被切成几个碎片, 以便实现组合。 不同领域( 范围从 4 至 20531 不等) 的真实世界数据集实验证实, 这项工作的方法是稳健的, 在结果分析中具有直观解释性。