A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clustering termed as DRBM-ClustNet is proposed. This core-clustering engine consists of a Deep Restricted Boltzmann Machine (DRBM) for processing unlabeled data by creating new features that are uncorrelated and have large variance with each other. Next, the number of clusters are predicted using the Bayesian Information Criterion (BIC), followed by a Kohonen Network-based clustering layer. The processing of unlabeled data is done in three stages for efficient clustering of the non-linearly separable datasets. In the first stage, DRBM performs non-linear feature extraction by capturing the highly complex data representation by projecting the feature vectors of $d$ dimensions into $n$ dimensions. Most clustering algorithms require the number of clusters to be decided a priori, hence here to automate the number of clusters in the second stage we use BIC. In the third stage, the number of clusters derived from BIC forms the input for the Kohonen network, which performs clustering of the feature-extracted data obtained from the DRBM. This method overcomes the general disadvantages of clustering algorithms like the prior specification of the number of clusters, convergence to local optima and poor clustering accuracy on non-linear datasets. In this research we use two synthetic datasets, fifteen benchmark datasets from the UCI Machine Learning repository, and four image datasets to analyze the DRBM-ClustNet. The proposed framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods. The obtained results demonstrate that the DRBM-ClustNet outperforms state-of-the-art clustering algorithms.
翻译:Bayesian Deep Restricted Boltzmann-Kohonen 数据存储库架构,称为 DRBM- ClustNet 数据存储器。 此核心分组引擎由深限 Boltzmann 机器(DRBM) 组成, 用于通过将美元维度的特性矢量投射为$美元维度, 从而产生很大的差异, 从而产生数据存储器数量。 其次, 使用 Kohoonen 网络分组。 未标记的数据处理分三个阶段进行, 高效组合非线性分解数据集。 在第一阶段, DRBMMM 运行非线性特征提取, 通过将美元维度的特性矢量投影到美元维维维维维维维维维度。 大多数组算法要求先决定组数量, 从而将我们使用 BIC 的第二阶段的组数自动化。 在第三阶段, BIC 从 BIC 获取的集成集成为 Kohonen 网络的输入, 将数据存储非线性特性特性特性特征数据提取, 通过将数据存储器的基数级数据流数据流数据流数据流数据流的组合到先前的基化数据采集数据采集数据流数据流数据流数据流数据流的基数。