In this case study, we describe the design and assembly of a cyber security testbed at Oak Ridge National Laboratory in Oak Ridge, TN, USA. The range is designed to provide agile reconfigurations to facilitate a wide variety of experiments for evaluations of cyber security tools -- particularly those involving AI/ML. In particular, the testbed provides realistic test environments while permitting control and programmatic observations/data collection during the experiments. We have designed in the ability to repeat the evaluations, so additional tools can be evaluated and compared at a later time. The system is one that can be scaled up or down for experiment sizes. At the time of the conference we will have completed two full-scale, national, government challenges on this range. These challenges are evaluating the performance and operating costs for AI/ML-based cyber security tools for application into large, government-sized networks. These evaluations will be described as examples providing motivation and context for various design decisions and adaptations we have made. The first challenge measured end-point security tools against 100K file samples (benignware and malware) chosen across a range of file types. The second is an evaluation of network intrusion detection systems efficacy in identifying multi-step adversarial campaigns -- involving reconnaissance, penetration and exploitations, lateral movement, etc. -- with varying levels of covertness in a high-volume business network. The scale of each of these challenges requires automation systems to repeat, or simultaneously mirror identical the experiments for each ML tool under test. Providing an array of easy-to-difficult malicious activity for sussing out the true abilities of the AI/ML tools has been a particularly interesting and challenging aspect of designing and executing these challenge events.
翻译:在本案例研究中,我们描述了美国TN州Oak Ridge的Oak Ridge国家实验室的网络安全测试的设计和组装。范围旨在提供灵活的重组,以便利对网络安全工具 -- -- 特别是涉及AI/ML的网络安全工具 -- -- 进行评估的多种实验。测试床提供了现实的测试环境,同时允许在实验期间进行控制和方案观测/数据收集。我们设计了能够重复评估的能力,因此可以在以后的时间里对更多的工具进行评估和比较。这个系统可以扩大或降低实验规模。在会议召开时,我们将完成两个全面的、全国性的政府挑战。这些挑战是评估基于AI/ML的网络安全工具在对大型政府规模网络的应用方面的绩效和运行成本。这些评估将描述为各种设计决定和调整的动机和背景。第一个挑战是对照一系列文件类型所选取的100K文件样本(金质和恶意)测量终端安全工具。第二个是评估网络入侵能力在确定各种高水平、高水平的服务器和高水平上,在确定多层次的服务器运动中,这些水平的深度和跨轨运动需要这些高水平的测试。