Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a convolutional neural network capable of processing up to $T=2,000,000$ steps. The $\mathcal{O}(T)$ memory of CNNs has prevented further application of CNNs to malware. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length $T$. This makes MalConv $116\times$ more memory efficient, and up to $25.8\times$ faster to train on its original dataset, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by developing a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv CNN. Our implementation can be found at https://github.com/NeuromorphicComputationResearchProgram/MalConv2
翻译:机器学习中的近期工程一直在处理规模不断增大的投入,网络安全带来的序列分类问题特别之长。在窗口可执行的恶意软件检测中,输入量可能超过100美元MB,相当于一个时间序列,相当于10万美元的10万美元。迄今为止,处理这项任务的最接近的方法是马康(Mal Conv),这是一个能够处理高达2 000万美元步骤的革命神经网络,能够处理高达2 000 000 000美元的输入长度限制。CNN的记忆用$mathcal{O}(T)阻止了CNN对恶意软件的进一步应用。在这项工作中,我们开发了一个新的时间最多集中机制,使所需的存储量与序列长度持异性($T$ ) 。这使得Mal Conv 1. 116\ 时间的记忆效率更高,并且达到25.8\ 时间,能够更快地培训最初的数据集,同时取消对MalConv的输入时间限制。我们将这些收益重新投入到改善MalConvy结构,开发新的全球频道Grant 设计,给我们一个关注机制,能够以高效的方式学习100万个步骤的特征互动,而我们无法在最初的MAGRMURMUR/RSURSON上找到一个能力。