Malicious scripts are an important computer infection threat vector. Our analysis reveals that the two most prevalent types of malicious scripts include JavaScript and VBScript. The percentage of detected JavaScript attacks are on the rise. To address these threats, we investigate two deep recurrent models, LaMP (LSTM and Max Pooling) and CPoLS (Convoluted Partitioning of Long Sequences), which process JavaScript and VBScript as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our models are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating these models on a large corpus of 296,274 JavaScript files indicates that the best performing LaMP model has a 65.9% true positive rate (TPR) at a false positive rate (FPR) of 1.0%. Similarly, the best CPoLS model has a TPR of 45.3% at an FPR of 1.0%. LaMP and CPoLS yield a TPR of 69.3% and 67.9%, respectively, at an FPR of 1.0% on a collection of 240,504 VBScript files.
翻译:恶意脚本是一个重要的计算机感染威胁矢量。 我们的分析显示, 两种最流行的恶意脚本类型包括 JavaScript 和 VBScript 。 被检测到的 JavaScript 袭击的百分比正在上升。 为了应对这些威胁, 我们调查了两种深层次的重复模型: LaMP (LSTM 和 Max Pooling) 和 CPOLS (长期序列的混合分割), 这两种模型处理 JavaScript 和 VBScript 的字节序列。 低层捕捉到这些字节序列的顺序性质, 而较高层则将这些字节序列归类为恶意或良性。 与先前提出的解决方案不同, 我们的模型在端到端方式上接受培训, 允许对连续处理层进行歧视性培训。 在296,274 JavaScript 和 CPOLIPLS 的大型组合中, 这些模型的精确正率为65.9%, 实际正率为1.0. 。 同样, CPLS 的TR为69. 50 和CR 文件的TR 。