在网络安全数据方面训练量子安奈尔法限制使用Boltzmann机器的网络安全数据 (Training a quantum annealing based restricted Boltzmann machine on cybersecurity data)

We present a real-world application that uses a quantum computer. Specifically, we train a RBM using QA for cybersecurity applications. The D-Wave 2000Q has been used to implement QA. RBMs are trained on the ISCX data, which is a benchmark dataset for cybersecurity. For comparison, RBMs are also trained using CD. CD is a commonly used method for RBM training. Our analysis of the ISCX data shows that the dataset is imbalanced. We present two different schemes to balance the training dataset before feeding it to a classifier. The first scheme is based on the undersampling of benign instances. The imbalanced training dataset is divided into five sub-datasets that are trained separately. A majority voting is then performed to get the result. Our results show the majority vote increases the classification accuracy up from 90.24% to 95.68%, in the case of CD. For the case of QA, the classification accuracy increases from 74.14% to 80.04%. In the second scheme, a RBM is used to generate synthetic data to balance the training dataset. We show that both QA and CD-trained RBM can be used to generate useful synthetic data. Balanced training data is used to evaluate several classifiers. Among the classifiers investigated, K-Nearest Neighbor (KNN) and Neural Network (NN) perform better than other classifiers. They both show an accuracy of 93%. Our results show a proof-of-concept that a QA-based RBM can be trained on a 64-bit binary dataset. The illustrative example suggests the possibility to migrate many practical classification problems to QA-based techniques. Further, we show that synthetic data generated from a RBM can be used to balance the original dataset.

翻译：我们展示了一个使用量子计算机的真实世界应用程序。具体地说, 我们用 QA 来培训一个用于网络安全应用的成果管理制。 D- Wave 2000Q 已经用于实施 QA 。 D- Wave 2000Q 已经用于实施 QA 。正在对 ISCX 数据进行培训, 这是网络安全的基准数据集。比较而言, RBD 也是用 CD 培训的。 CD 是一种常用的成果管理制培训方法。我们对 ISCX 数据的分析表明, 数据集是不平衡的。我们用两种不同的方案来平衡培训数据集。第一个方案基于对良性实例的抽查。不平衡的培训数据集分为五个子数据集, 单独培训。然后进行多数投票, 以获得结果。我们的投票结果显示, 在 CD 中, 将分类的精确率从90. 24%提高到95.68% 。在QA 中, 分类的精确度从74. 增加到80. 04% 。在第二个方案中, 使用一种成果管理制来生成合成数据, 平衡培训的示例。我们显示, QA 和CD 正在数据正在使用的一个数据级数据。