Domains such as manufacturing and medicine crave for continuous monitoring and analysis of their processes, especially in combination with time series as produced by sensors. Time series data can be exploited to, for example, explain and predict concept drifts during runtime. Generally, a certain data volume is required in order to produce meaningful analysis results. However, reliable data sets are often missing, for example, if event streams and times series data are collected separately, in case of a new process, or if it is too expensive to obtain a sufficient data volume. Additional challenges arise with preparing time series data from multiple event sources, variations in data collection frequency, and concept drift. This paper proposes the GENLOG approach to generate reliable event and time series data that follows the distribution of the underlying input data set. GENLOG employs data resampling and enables the user to select different parts of the log data to orchestrate the training of a recurrent neural network for stream generation. The generated data is sampled back to its original sample rate and is embedded into a template representing the log data format it originated from. Overall, GENLOG can boost small data sets and consequently the application of online process mining.
翻译:时间序列数据可用于解释和预测运行期间的概念漂移。一般而言,需要一定的数据量才能产生有意义的分析结果。但是,如果事件流和时间序列数据是单独收集的,如果是新过程,或者如果是事件流和时间序列数据太昂贵,无法获取足够数量的数据,则往往缺少可靠的数据集。从多个事件来源编制时间序列数据、数据收集频率的变化和概念漂移,还会产生额外的挑战。本文建议GENLOG方法产生可靠的事件和时间序列数据,以在基本输入数据集的分布之后产生可靠的事件和时间序列数据。GENLOG采用数据抽样,使用户能够选择记录数据的不同部分,以协调对循环生成的经常性神经网络的培训。生成的数据抽样回溯到最初的样本率,并嵌入一个反映其来源的日志数据格式的模板。总体而言,GENLOG可以增加小数据集,从而应用在线进程采矿。