As the importance of intrusion detection and prevention systems (IDPSs) increases, great costs are incurred to manage the signatures that are generated by malicious communication pattern files. Experts in network security need to classify signatures by importance for an IDPS to work. We propose and evaluate a machine learning signature classification model with a reject option (RO) to reduce the cost of setting up an IDPS. To train the proposed model, it is essential to design features that are effective for signature classification. Experts classify signatures with predefined if-then rules. An if-then rule returns a label of low, medium, high, or unknown importance based on keyword matching of the elements in the signature. Therefore, we first design two types of features, symbolic features (SFs) and keyword features (KFs), which are used in keyword matching for the if-then rules. Next, we design web information and message features (WMFs) to capture the properties of signatures that do not match the if-then rules. The WMFs are extracted as term frequency-inverse document frequency (TF-IDF) features of the message text in the signatures. The features are obtained by web scraping from the referenced external attack identification systems described in the signature. Because failure needs to be minimized in the classification of IDPS signatures, as in the medical field, we consider introducing a RO in our proposed model. The effectiveness of the proposed classification model is evaluated in experiments with two real datasets composed of signatures labeled by experts: a dataset that can be classified with if-then rules and a dataset with elements that do not match an if-then rule. In the experiment, the proposed model is evaluated. In both cases, the combined SFs and WMFs performed better than the combined SFs and KFs. In addition, we also performed feature analysis.
翻译:随着入侵探测和预防系统(IDPS)重要性的提高,管理恶意通信模式文件生成的签名需要付出高昂的费用。网络安全专家需要将签名按重要程度进行分类,对互联网安全系统进行工作。我们建议并评价一个带有拒绝选项的机器学习签名分类模型(RO),以减少建立互联网安全系统的成本。培训拟议模式时,必须设计对签名分类有效的特征。专家们用预先定义规则对签名进行分类。如果当时规则返回一个低、中、高或未知重要性的标签,则该规则根据签名要素的关键词匹配。因此,网络安全专家需要将签名进行分类。我们首先设计两类特征,即符号特征(SF)和关键词(KF),这些特征用于关键词匹配规则(RO),以降低设置规则的成本;然后,我们设计网络信息和信息模型(WMF),以获取与规则不符的签名属性;我们提议的模式是作为术语反频率文件频率(TF-IDF)的频率(T-IDF),其特点是低、高或未知。在签名分类中,如果通过互联网检索中,则通过互联网定义定义定义定义定义定义,则需要进行实地数据识别数据分类,则进行数据分析。