With the popularity of Machine Learning (ML) solutions, algorithms and data have been released faster than the capacity of processing them. In this context, the problem of Algorithm Recommendation (AR) is receiving a significant deal of attention recently. This problem has been addressed in the literature as a learning task, often as a Meta-Learning problem where the aim is to recommend the best alternative for a specific dataset. For such, datasets encoded by meta-features are explored by ML algorithms that try to learn the mapping between meta-representations and the best technique to be used. One of the challenges for the successful use of ML is to define which features are the most valuable for a specific dataset since several meta-features can be used, which increases the meta-feature dimension. This paper presents an empirical analysis of Feature Selection and Feature Extraction in the meta-level for the AR problem. The present study was focused on three criteria: predictive performance, dimensionality reduction, and pipeline runtime. As we verified, applying Dimensionality Reduction (DR) methods did not improve predictive performances in general. However, DR solutions reduced about 80% of the meta-features, obtaining pretty much the same performance as the original setup but with lower runtimes. The only exception was PCA, which presented about the same runtime as the original meta-features. Experimental results also showed that various datasets have many non-informative meta-features and that it is possible to obtain high predictive performance using around 20% of the original meta-features. Therefore, due to their natural trend for high dimensionality, DR methods should be used for Meta-Feature Selection and Meta-Feature Extraction.
翻译:随着机器学习(ML)解决方案的普及程度,算法和数据的发布速度快于处理这些解决方案的能力。在这方面,Agorithm建议(AR)问题最近正受到大量关注。这个问题在文献中作为一个学习任务得到了解决,通常作为一个元学习问题,目的是为特定数据集推荐最佳的替代方案。为此,由元性特点编码的数据集由ML算法探索,该算法试图学习元性代表和应使用的最佳技术之间的映射。在这方面,成功使用 ML 的建议(AR) 的问题之一是确定哪些功能对特定数据集最有价值,因为可以使用一些元性能,这增加了元性层面。本文对元性选择和特质抽调的实验性分析进行了分析。当前研究侧重于三个标准:预测性能、尺寸下降、以及编审时期。我们核实,应用度降低性能的方法没有改进预测性能,而是在常规性性能上,因此,运行期性能通常要降低多少次性能。此外,运行期的元性能方法应该显示相同,初始性结果应该降低至原始。