Machine learning (ML)-based methods have recently become attractive for detecting security vulnerability exploits. Unfortunately, state-of-the-art ML models like long short-term memories (LSTMs) and transformers incur significant computation overheads. This overhead makes it infeasible to deploy them in real-time environments. We propose a novel ML-based exploit detection model, ML-FEED, that enables highly efficient inference without sacrificing performance. We develop a novel automated technique to extract vulnerability patterns from the Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) databases. This feature enables ML-FEED to be aware of the latest cyber weaknesses. Second, it is not based on the traditional approach of classifying sequences of application programming interface (API) calls into exploit categories. Such traditional methods that process entire sequences incur huge computational overheads. Instead, ML-FEED operates at a finer granularity and predicts the exploits triggered by every API call of the program trace. Then, it uses a state table to update the states of these potential exploits and track the progress of potential exploit chains. ML-FEED also employs a feature engineering approach that uses natural language processing-based word embeddings, frequency vectors, and one-hot encoding to detect semantically-similar instruction calls. Then, it updates the states of the predicted exploit categories and triggers an alarm when a vulnerability fingerprint executes. Our experiments show that ML-FEED is 72.9x and 75,828.9x faster than state-of-the-art lightweight LSTM and transformer models, respectively. We trained and tested ML-FEED on 79 real-world exploit categories. It predicts categories of exploit in real-time with 98.2% precision, 97.4% recall, and 97.8% F1 score. These results also outperform the LSTM and transformer baselines.
翻译:机器学习(ML)方法最近对探测安全脆弱性的利用具有吸引力。 不幸的是,最先进的ML模型,如长期短期记忆(LSTMs)和变压器等,引起大量计算间接费用。 高空使得无法在实时环境中部署这些模型。 我们建议采用基于ML的新型开发探测模型ML-FEED, 它可以在不牺牲性能的情况下高效推断。 我们开发了一种新型自动技术,从常见弱点计算(CWE)和常见脆弱性和曝光(CVE)数据库中提取脆弱性模式。 这个功能使ML-FED能够了解最新的网络弱点。 其次,它不基于传统的方法,将应用程序程序接口的序列分类分类分类分类分类(ML-FEED)进行实时部署。 而ML-LEFED运行在最精细微的颗粒上运行, 并且预测由每个光量的电磁性电量调调调调出(IP) 。 然后,它使用州级表来更新MLEFI-9的直径(M-lickral-lax) 最新版本, 利用了我们不断的频率的频率的变电图, 和直压模型的模型, 利用了我们不断的变换的变换的变换的变换的变换的频率, 。</s>