Datasets involving sequences of different types of events without meaningful time stamps are prevalent in many applications, for instance when extracted from textual corpora. We propose a family of models for such event sequences -- summary Markov models -- where the probability of observing an event type depends only on a summary of historical occurrences of its influencing set of event types. This Markov model family is motivated by Granger causal models for time series, with the important distinction that only one event can occur in a position in an event sequence. We show that a unique minimal influencing set exists for any set of event types of interest and choice of summary function, formulate two novel models from the general family that represent specific sequence dynamics, and propose a greedy search algorithm for learning them from event sequence data. We conduct an experimental investigation comparing the proposed models with relevant baselines, and illustrate their knowledge acquisition and discovery capabilities through case studies involving sequences from text.
翻译:涉及不同类型事件序列的数据集,没有有意义的时间印记,在许多应用中十分普遍,例如,从文字组合中提取时。我们提议了这类事件序列的一组模型 -- -- 即Markov模型 -- --,其中观测事件类型的概率仅取决于其影响事件类型类型的历史发生概况。这个Markov模型系由时间序列的Granger因果模型驱动,重要的区别是,在一个事件序列中,只有一个事件可以发生。我们表明,对于任何一系列事件,都存在独特的最低影响集,即任何一系列感兴趣的事件类型和选择摘要功能,我们从一般家庭中制定两个代表具体序列动态的新模型,并提出从事件序列数据中学习这些模型的贪婪搜索算法。我们进行一项实验性调查,将拟议的模型与相关基线进行比较,并通过涉及文本序列的案例研究来说明其知识获取和发现能力。