QAOA 的分类特点选择 (Feature Selection for Classification with QAOA)

Feature selection is of great importance in Machine Learning, where it can be used to reduce the dimensionality of classification, ranking and prediction problems. The removal of redundant and noisy features can improve both the accuracy and scalability of the trained models. However, feature selection is a computationally expensive task with a solution space that grows combinatorically. In this work, we consider in particular a quadratic feature selection problem that can be tackled with the Quantum Approximate Optimization Algorithm (QAOA), already employed in combinatorial optimization. First we represent the feature selection problem with the QUBO formulation, which is then mapped to an Ising spin Hamiltonian. Then we apply QAOA with the goal of finding the ground state of this Hamiltonian, which corresponds to the optimal selection of features. In our experiments, we consider seven different real-world datasets with dimensionality up to 21 and run QAOA on both a quantum simulator and, for small datasets, the 7-qubit IBM (ibm-perth) quantum computer. We use the set of selected features to train a classification model and evaluate its accuracy. Our analysis shows that it is possible to tackle the feature selection problem with QAOA and that currently available quantum devices can be used effectively. Future studies could test a wider range of classification models as well as improve the effectiveness of QAOA by exploring better performing optimizers for its classical step.

翻译：在机器学习中, 选择地物非常重要, 它可用于降低分类、排名和预测问题的维度。去除冗余和吵杂的特性可以提高经过训练的模型的准确性和可缩放性。然而, 特性选择是一项计算成本很高的任务, 其解决方案空间会增长交织。在这项工作中, 我们特别考虑到一个二次特征选择问题, 可以通过量子模拟器( QAOA) 来解决, 已经在组合优化中使用。首先, 我们代表了QUB 配方的特征选择问题, 该配方随后被映射到一个旋转的汉密尔顿仪。然后我们应用QAOA 来寻找这个汉密尔顿模型的地面状态, 与最佳地貌选择相匹配。在我们的实验中, 我们考虑七个不同的真实世界数据集, 其维度可高达21, 并在一个量子模拟器( QA) 上运行QA, 用来通过一个更精确的精确度模型( ibeptime) QA ), 来有效地进行我们所选的精确度分析。

相关内容

特征选择

关注 5935

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日