Ever more frequently network management tasks apply machine learning on network traffic. Both the accuracy of a machine learning model and its effectiveness in practice ultimately depend on the representation of raw network traffic as features. Often, the representation of the traffic is as important as the choice of the model itself; furthermore, the features that the model relies on will ultimately determine where (and even whether) the model can be deployed in practice. This paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for a practical network management task, video streaming quality inference, and show that the appropriate operating point for these two dimensions depends on the deployment scenario. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept reference implementation that both monitors network traffic at 10 Gbps and transforms the traffic in real time to produce a variety of feature representations for machine learning models. Traffic Refinery both highlights this design space and makes it possible for network operators to easily explore different representations for learning, balancing systems costs related to feature extraction and model training against the resulting model performance.
翻译:计算机学习模式的准确性及其实际效力最终取决于网络实际管理任务、视频流质量推断的这两个层面,并表明这两个层面的适当操作点取决于部署设想。 我们展示了探讨网络交通和当前交通精炼的一系列表述方式的好处,这是监测10千兆字节的网络交通和实时改造交通以便产生各种机器学习模型特征描述的验证参考执行。