The proliferation of zero-day threats (ZDTs) to companies' networks has been immensely costly and requires novel methods to scan traffic for malicious behavior at massive scale. The diverse nature of normal behavior along with the huge landscape of attack types makes deep learning methods an attractive option for their ability to capture highly-nonlinear behavior patterns. In this paper, the authors demonstrate an improvement upon a previously introduced methodology, which used a dual-autoencoder approach to identify ZDTs in network flow telemetry. In addition to the previously-introduced asset-level graph features, which help abstractly represent the role of a host in its network, this new model uses metric learning to train the second autoencoder on labeled attack data. This not only produces stronger performance, but it has the added advantage of improving the interpretability of the model by allowing for multiclass classification in the latent space. This can potentially save human threat hunters time when they investigate predicted ZDTs by showing them which known attack classes were nearby in the latent space. The models presented here are also trained and evaluated with two more datasets, and continue to show promising results even when generalizing to new network topologies.
翻译:向公司网络零日威胁(ZDTs)的扩散代价巨大,要求采用新颖的方法对大规模恶意行为的贩运进行扫描。正常行为的多样性加上巨大的攻击类型类型,使得深层次学习方法成为吸引它们捕捉高度非线性行为模式的能力的一种选择。在本文中,作者们展示了以前采用的方法的改进,这种方法使用双自动编码器在网络流动遥测中识别ZDTs。除了以前引入的资产级图表功能,有助于抽象地代表网络主机的作用外,这一新模型还使用标准学习来培训第二自动编码器使用标签攻击数据。这不仅能产生更强的性能,而且还具有通过允许在潜藏空间进行多级分类来改进模型的可解释性的额外优势。这有可能在他们调查预测的ZDT时,通过显示他们已知的攻击等级在潜伏空间附近。这里展示的模型还用两个数据集来培训和评估,并且继续显示有希望的结果,即使将新网络的顶层加以概括。