When classifying network traffic, a key challenge is deciding when to perform the classification, i.e., after how many packets. Too early, and the decision basis is too thin to classify a flow confidently; too late, and the tardy labeling delays crucial actions (e.g., shutting down an attack) and invests computational resources for too long (e.g., tracking and storing features). Moreover, the optimal decision timing varies across flows. We present pForest, a system for "As Soon As Possible" (ASAP) in-network classification according to supervised machine learning models on top of programmable data planes. pForest automatically classifies each flow as soon as its label is sufficiently established, not sooner, not later. A key challenge behind pForest is finding a strategy for dynamically adapting the features and the classification logic during the lifetime of a flow. pForest solves this problem by: (i) training random forest models tailored to different phases of a flow; and (ii) dynamically switching between these models in real time, on a per-packet basis. pForest models are tuned to fit the constraints of programmable switches (e.g., no floating points, no loops, and limited memory) while providing a high accuracy. We implemented a prototype of pForest in Python (training) and P4 (inference). Our evaluation shows that pForest can classify traffic ASAP for hundreds of thousands of flows, with a classification score that is on-par with software-based solutions.
翻译:在对网络交通进行分类时,一个关键的挑战是如何决定何时进行分类,即在多少个数据包之后。太早了,决定基础太薄,无法对流量进行自信的分类;太晚,延迟标签拖延关键行动(例如,关闭攻击)并投资计算资源的时间过长(例如,跟踪和存储特性)。此外,最佳决策时间因流程而异。我们提出了Forest,一个“尽可能快”的网络分类系统(ASAP),根据可编程数据平面上受监督的机器学习模型进行分类。Forest一旦标签足够固定,即自动对每个流量进行分类,而不是更早、更晚。Forest后面的一个关键挑战是找到一种战略,在流动的一生中动态调整特征和分类逻辑(例如,跟踪和存储特性)。PForest可以针对流动的不同阶段对随机森林模型进行培训;以及(二)在可编程的基础上,根据每个可编程的模型进行动态转换。 工具格式模型一旦标签足够,就会自动对每个流量进行分类,而不是更早、更晚。