Datasets are essential to apply AI algorithms to Cyber Physical System (CPS) Security. Due to scarcity of real CPS datasets, researchers elected to generate their own datasets using either real or virtualized testbeds. However, unlike other AI domains, a CPS is a complex system with many interfaces that determine its behavior. A dataset that comprises merely a collection of sensor measurements and network traffic may not be sufficient to develop resilient AI defensive or offensive agents. In this paper, we study the \emph{elements} of CPS security datasets required to capture the system behavior and interactions, and propose a dataset architecture that has the potential to enhance the performance of AI algorithms in securing cyber physical systems. The framework includes dataset elements, attack representation, and required dataset features. We compare existing datasets to the proposed architecture to identify the current limitations and discuss the future of CPS dataset generation using testbeds.
翻译:数据集对于将AI算法应用于网络物理系统安全至关重要。由于实际CPS数据集稀缺,研究人员选择使用真实或虚拟测试床生成自己的数据集。然而,与其他AI域不同,CPS是一个复杂的系统,有许多界面,决定其行为。仅仅收集传感器测量和网络交通的数据集可能不足以开发具有复原力的AI防御或攻击剂。在本文中,我们研究了为捕捉系统行为和互动所需的CPS安全数据集的\emph{emints},并提出了一个数据集架构,该架构有可能提高AI算法在确保网络物理系统安全方面的性能。该框架包括数据集元素、攻击说明和所需的数据集特征。我们将现有的数据集与拟议架构进行比较,以确定当前的限制,并讨论使用测试床生成CPS数据集的未来。