Clustering is an unsupervised machine learning method grouping data samples into clusters of similar objects. In practice, clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines. However, the existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters (e.g. number of clusters, centroids). Different solutions were presented in the literature to overcome this limitation (i.e. internal and external validation metrics). However, these solutions require high computational complexity and memory consumption, especially when dealing with high dimensional data. In this paper, we apply the recent object detection Deep Learning (DL) model, named YOLO-v5, to detect the initial clustering parameters such as the number of clusters with their sizes and possible centroids. Mainly, the proposed solution consists of adding a DL-based initialization phase making the clustering algorithms free of initialization. The results show that the proposed solution can provide near-optimal clusters initialization parameters with low computational and resources overhead compared to existing solutions.
翻译:集群是一种未经监督的机械学习方法,将数据样本分组为相似对象群集。实际上,在银行客户特征分析、文件检索、图像分割和电子商务建议引擎等许多应用中,都使用了集群,但是,现有的集群技术存在很大的局限性,其稳定性取决于初始化参数(例如集群数量、中子体),文献中提出了克服这一局限性的不同解决方案(即内部和外部验证指标)。然而,这些解决方案需要高计算复杂性和内存消耗,特别是在处理高维数据时。在本文件中,我们应用最新的天体探测深学习模型(DL),名为YOLO-V5,以探测初始集群参数,如其大小和可能的固态的集群数量。主要,拟议解决方案包括增加基于DL的初始化阶段,使群集算算法免于初始化。结果显示,拟议解决方案可以提供接近最优化的集群群集初始化参数,与现有解决方案相比,低的计算和资源管理量。