The ability to identify applications based on the network data they generate could be a valuable tool for cyber defense. We report on a machine learning technique capable of using netflow-like features to predict the application that generated the traffic. In our experiments, we used ground-truth labels obtained from host-based sensors deployed in a large enterprise environment; we applied random forests and multilayer perceptrons to the tasks of browser vs. non-browser identification, browser fingerprinting, and process name prediction. For each of these tasks, we demonstrate how machine learning models can achieve high classification accuracy using only netflow-like features as the basis for classification.
翻译:根据所生成的网络数据确定应用程序的能力可以成为网络防御的宝贵工具。 我们报告一种机器学习技术,这种技术能够使用类似网络流的特性来预测产生流量的应用。 在实验中,我们使用了从大型企业环境中部署的基于主机的传感器中获得的地面真实性标签; 我们应用随机森林和多层感应器来完成浏览器对非浏览器识别、浏览器指纹和进程名称预测的任务。 对于每一项任务,我们演示机器学习模型如何使用仅以类似网络流的特性作为分类基础而实现高分类精度。