This study investigates the efficacy of machine learning models in network anomaly detection through the critical lens of partial versus complete flow information. We systematically evaluate how models perform under varying training and testing conditions, quantifying the performance impact when dealing with incomplete data typical in real-time environments. Our findings demonstrate a significant performance difference, with precision and recall dropping by up to 30% under certain conditions when models trained on complete flows are tested against partial flows. Conversely, models trained and tested on consistently complete or partial datasets maintain robustness. The study reveals that a minimum of 7 packets in the test set is required for maintaining reliable detection rates, providing valuable insights for real-time detection strategies. These results offer important guidance for deploying machine learning models in operational network security environments.
翻译:暂无翻译