Feature selection is a technique in statistical prediction modeling that identifies features in a record with a strong statistical connection to the target variable. Excluding features with a weak statistical connection to the target variable in training not only drops the dimension of the data, which decreases the time complexity of the algorithm, it also decreases noise within the data which assists in avoiding overfitting. In all, feature selection assists in training a robust statistical model that performs well and is stable. Given the lack of scalability in classical computation, current techniques only consider the predictive power of the feature and not redundancy between the features themselves. Recent advancements in feature selection that leverages quantum annealing (QA) gives a scalable technique that aims to maximize the predictive power of the features while minimizing redundancy. As a consequence, it is expected that this algorithm would assist in the bias/variance trade-off yielding better features for training a statistical model. This paper tests this intuition against classical methods by utilizing open-source data sets and evaluate the efficacy of each trained statistical model well-known prediction algorithms. The numerical results display an advantage utilizing the features selected from the algorithm that leveraged QA.
翻译:特征选择是一种统计预测模型技术,它能识别与目标变量有很强的统计联系的记录中的特征。不包括在培训中与目标变量的统计联系薄弱的特征,不仅会降低数据维度,从而降低算法的时间复杂性,而且还会减少有助于避免过度匹配的数据中的噪音。总的来说,特征选择有助于培训一个功能良好且稳定的强健统计模型。鉴于经典计算缺乏可缩放性,当前技术只考虑特征的预测力,而不是功能本身的冗余。最近特征选择的进展,利用量子射精法(QA)使特征的预测力最大化,同时尽量减少冗余。因此,预计这种算法将有助于偏差/差异性交易产生更好的特征,从而对统计模型进行培训。本文利用开放源数据集和评价每个经过培训的统计模型广为人知的预测算法的功效,以此对照经典方法测试这种直觉。数字结果显示利用从利用QA的算法选择的特征的优势。