DBSCAN is widely used in many scientific and engineering fields because of its simplicity and practicality. However, due to its high sensitivity parameters, the accuracy of the clustering result depends heavily on practical experience. In this paper, we first propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process, which aims to find the best clustering parameters without manual assistance. DRL-DBSCAN learns the optimal clustering parameter search policy for different feature distributions via interacting with the clusters, using a weakly-supervised reward training policy network. In addition, we also present a recursive search mechanism driven by the scale of the data to efficiently and controllably process large parameter spaces. Extensive experiments are conducted on five artificial and real-world datasets based on the proposed four working modes. The results of offline and online tasks show that the DRL-DBSCAN not only consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively, but also can stably find the dominant parameters with high computational efficiency. The code is available at https://github.com/RingBDStack/DRL-DBSCAN.
翻译:DBSCAN由于其简单实用性,在许多科学和工程领域广泛使用DBSCAN。然而,由于它的高度敏感性参数,集群结果的准确性在很大程度上取决于实际经验。在本文件中,我们首先提出一个新的新型深强化学习引导DBSCAN自动参数搜索框架,即DRL-DBSCAN。框架模型通过将组合环境视为一个Markov决定程序来调整参数搜索方向,该程序的目的是在没有人工协助的情况下找到最佳的组合参数。DRL-DBSCAN通过与集群互动,利用一个微弱监督的奖励培训政策网络,学习不同特性分布的最佳组合参数搜索政策。此外,我们还提出了一个由数据规模驱动的循环搜索机制,以高效和可控地处理大参数空间。根据拟议的四种工作模式对五个人工和真实世界数据集进行了广泛的实验。离线和在线任务的结果显示,DBSL-DBCANCAN不仅不断提高DBCAN的精确度,最高至26 %和25 %,而且还可以稳定地在MADRDR/DR节中找到高标准。