In this paper, we propose a framework for early-stage malware detection and mitigation by leveraging natural language processing (NLP) techniques and machine learning algorithms. Our primary contribution is presenting an approach for predicting the upcoming actions of malware by treating application programming interface (API) call sequences as natural language inputs and employing text classification methods, specifically a Bi-LSTM neural network, to predict the next API call. This enables proactive threat identification and mitigation, demonstrating the effectiveness of applying NLP principles to API call sequences. The Bi-LSTM model is evaluated using two datasets. %The model achieved an accuracy of 93.6\% and 88.8\% for the %first and second dataset respectively. Additionally, by modeling consecutive API calls as 2-gram and 3-gram strings, we extract new features to be further processed using a Bagging-XGBoost algorithm, effectively predicting malware presence at its early stages. The accuracy of the proposed framework is evaluated by simulations.
翻译:暂无翻译