Ransomware has emerged as one of the major global threats in recent days. The alarming increasing rate of ransomware attacks and new ransomware variants intrigue the researchers in this domain to constantly examine the distinguishing traits of ransomware and refine their detection or classification strategies. Among the broad range of different behavioral characteristics, the trait of Application Programming Interface (API) calls and network behaviors have been widely utilized as differentiating factors for ransomware detection, or classification. Although many of the prior approaches have shown promising results in detecting and classifying ransomware families utilizing these features without applying any feature selection techniques, feature selection, however, is one of the potential steps toward an efficient detection or classification Machine Learning model because it reduces the probability of overfitting by removing redundant data, improves the model's accuracy by eliminating irrelevant features, and therefore reduces training time. There have been a good number of feature selection techniques to date that are being used in different security scenarios to optimize the performance of the Machine Learning models. Hence, the aim of this study is to present the comparative performance analysis of widely utilized Supervised Machine Learning models with and without RFECV feature selection technique towards ransomware classification utilizing the API call and network traffic features. Thereby, this study provides insight into the efficiency of the RFECV feature selection technique in the case of ransomware classification which can be used by peers as a reference for future work in choosing the feature selection technique in this domain.
翻译:近日来,Ransomware(API)电话和网络行为已成为主要的全球威胁之一,尽管在利用这些特征而未应用任何特征选择技术的情况下发现和分类赎金软件的家庭方面出现了惊人的上升速度和新的赎金软件变式,使这一领域的研究人员不断探究赎金软件的区别特征,并完善其探测或分类战略;在各种不同的行为特征中,应用程序程序接口(API)电话和网络行为的特点被广泛用作识别或分类赎金软件的区别要素之一;尽管许多先前的做法在利用这些特征而未应用任何特征选择技术的情况下发现和分类赎金软件家庭方面显示出了令人乐观的结果,但特征选择是朝着高效检测或分类机器学习模式而迈出的潜在步骤之一,因为通过删除冗余数据来降低过度匹配的可能性,通过消除不相干特征来提高模型的准确性,从而缩短培训时间;迄今为止,在不同的安全情景中使用了大量特征选择技术来优化机器学习模型的性能;因此,这项研究的目的是对广泛使用的超级机器学习模型进行比较性能分析,而不用RFV(RECV)特性学习模式选择用于选择赎金软件分类方法,从而在选择成本选择成本选择系统选择系统选择技术的方法进行。