To get a good understanding of a dynamical system, it is convenient to have an interpretable and versatile model of it. Timed discrete event systems are a kind of model that respond to these requirements. However, such models can be inferred from timestamped event sequences but not directly from numerical data. To solve this problem, a discretization step must be done to identify events or symbols in the time series. Persist is a discretization method that intends to create persisting symbols by using a score called persistence score. This allows to mitigate the risk of undesirable symbol changes that would lead to a too complex model. After the study of the persistence score, we point out that it tends to favor excessive cases making it miss interesting persisting symbols. To correct this behavior, we replace the metric used in the persistence score, the Kullback-Leibler divergence, with the Wasserstein distance. Experiments show that the improved persistence score enhances Persist's ability to capture the information of the original time series and that it makes it better suited for discrete event systems learning.
翻译:要很好地理解动态系统, 方便地对它有一个可解释和多功能的模型。 时间分事件系统是一种符合这些要求的模式。 但是, 这些模型可以从时间戳事件序列中推断, 而不是直接从数字数据中推断。 要解决这个问题, 就必须采取分解步骤来识别时间序列中的事件或符号。 周期分解是一种分解方法, 目的是通过使用称为持久性分数的分数来创建持久性符号。 这样可以减少不受欢迎的符号变化的风险, 从而导致一个过于复杂的模型。 在研究持久性分数后, 我们发现它倾向于偏爱过多的案例, 从而错过有趣的持续符号 。 为了纠正这一行为, 我们用瓦塞斯坦距离来取代了持续分中所用的标准, 库列伯 - 利伯尔差 。 实验显示, 改进的持久性分数能增强珀斯获取原始时间序列信息的能力, 并且它更适合离散事件系统学习 。