Botnets are one of the online threats with the biggest presence, causing billionaire losses to global economies. Nowadays, the increasing number of devices connected to the Internet makes it necessary to analyze large amounts of network traffic data. In this work, we focus on increasing the performance on botnet traffic classification by selecting those features that further increase the detection rate. For this purpose we use two feature selection techniques, Information Gain and Gini Importance, which led to three pre-selected subsets of five, six and seven features. Then, we evaluate the three feature subsets along with three models, Decision Tree, Random Forest and k-Nearest Neighbors. To test the performance of the three feature vectors and the three models we generate two datasets based on the CTU-13 dataset, namely QB-CTU13 and EQB-CTU13. We measure the performance as the macro averaged F1 score over the computational time required to classify a sample. The results show that the highest performance is achieved by Decision Trees using a five feature set which obtained a mean F1 score of 85% classifying each sample in an average time of 0.78 microseconds.
翻译:Botnet是最大的在线威胁之一,给全球经济造成了亿万富翁的损失。 如今,连接互联网的装置越来越多,因此有必要分析大量网络流量数据。 在这项工作中,我们侧重于通过选择能够进一步提高检测率的特征来提高对botnet交通分类的性能。 为此,我们使用了两种特征选择技术,即信息增益和基尼重要性,这导致预选了三个子集,共5、6和7个特征。然后,我们评估了三个子集,以及三个模型,即决策树、随机森林和K-近距离邻居。要测试三个功能矢量的性能和三个模型,我们根据CTU-13数据集生成了两个数据集,即QB-CTU13和EQB-CTU13。我们测量的性能是,在对样本进行分类所需的计算时间中,以宏观平均F1分数衡量。结果显示,最高性能是通过决策树获得平均F1分,每个样本在0.78微秒的平均时间的F1分为85%。