State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when dealing with large datasets. In this paper we propose a method to learn state machines from data streams using the count-min-sketch data structure to reduce memory requirements. We apply state merging using the well-known red-blue-framework to reduce the search space. We implemented our approach in an established framework for learning state machines, and evaluated it on a well know dataset to provide experimental data, showing the effectiveness of our approach with respect to quality of the results and run-time.
翻译:国家机器是模型和可视化离散系统(如软件系统)以及代表常规语法的流行模型。 多数从数据中被动地学习国家机器的算法都假定所有数据从一开始就可用,并将这些数据装入记忆中。 这使得难以将这些数据应用于不断流数据,在处理大型数据集时导致大量记忆要求。 在本文中,我们提出了一个方法,用计数- 负数数据结构从数据流中学习国家机器,以减少记忆要求。 我们运用众所周知的红色蓝图框架进行国家合并,以减少搜索空间。 我们在一个成熟的学习国家机器的框架内采用了我们的方法,并在一个非常熟悉的数据集上进行了评估,以提供实验数据,表明我们的方法在结果质量和运行时间方面的有效性。