The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection - the identification of high-performing algorithm(s) for a given dataset, task, and evaluation measure. This study addresses the algorithm selection challenge for data clustering, a fundamental task in data mining that is aimed at grouping similar objects. We present MARCO-GE, a novel meta-learning approach for the automated recommendation of clustering algorithms. MARCO-GE first transforms datasets into graphs and then utilizes a graph convolutional neural network technique to extract their latent representation. Using the embedding representations obtained, MARCO-GE trains a ranking meta-model capable of accurately recommending top-performing algorithms for a new dataset and clustering evaluation measure. Extensive evaluation on 210 datasets, 13 clustering algorithms, and 10 clustering measures demonstrates the effectiveness of our approach and its superiority in terms of predictive and generalization performance over state-of-the-art clustering meta-learning approaches.
翻译:广泛采用机器学习技术以及应用这些技术所需的广泛专门知识,使人们对自动ML解决方案的兴趣增加,从而减少了人类干预的需要。应用ML解决方案的主要挑战之一是算法选择——为特定数据集、任务和评价措施确定高性能算法算法。这项研究涉及数据集的算法选择挑战,这是数据挖掘的一项基本任务,目的是对类似物体进行分组。我们介绍了MARCO-GE,这是自动推荐组合算法的一种新型元学习方法。MARCO-GE首先将数据集转换成图表,然后利用图层神经神经网络技术来提取其潜在代表。利用已获得的嵌入式代表制,MARCO-GE培训了能够准确建议新数据集和组合评价措施最佳性算法的排名元模型。对210个数据集、13个组合算法和10个组合措施进行广泛评价,表明我们的方法的有效性及其在预测和一般化业绩方面优于状态组合元学习方法。