Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when processing very large logs that can be commonly found in practice. In this paper, we address the scalability problem of inferring the model of a component-based system from large system logs, without requiring any extra information. Our model inference technique, called PRINS, follows a divide-and-conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the flow of events across components, as reflected in the logs. We evaluated PRINS in terms of scalability and accuracy, using nine datasets composed of logs extracted from publicly available benchmarks and a personal computer running desktop business applications. The results show that PRINS can process large logs much faster than a publicly available and well-known state-of-the-art tool, without significantly compromising the accuracy of inferred models.
翻译:在许多软件工程任务中,行为软件模型发挥着关键作用;不幸的是,这些模型要么在软件开发期间没有可用,要么随着执行的演变而迅速过时。模型推断技术被提议为从执行日志中提取有限状态模型的可行解决办法。然而,在处理实践中常见的非常大日志时,现有技术规模不高。在本文件中,我们处理从大型系统日志中推断一个基于组件的系统模型的可缩放性问题,而不需要任何额外信息。我们的模型推论技术(称为PRINS)遵循一种分化和化方法。其设想是首先从相应的日志中推算出一个每个系统组成部分的模型;然后,将单个组成部分模型合并在一起,同时考虑到各个组成部分的动态,如日志所反映的那样。我们用从公开的基准中提取的日志和个人计算机操作桌面业务应用程序,用九个数据集对PRINS进行了可扩展性评估,这些数据集可以比公开和广为人知的精确度工具模型处理的大记录要快得多。