Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their classical counterparts), it is natural to ask questions about future scenarios such as "what kind of event will occur next" or "will an event of type $A$ occur before one of type $B$". Unfortunately, some of these queries are notoriously hard to address since current methods are limited to naive simulation, which can be highly inefficient. This paper introduces a new typology of query types and a framework for addressing them using importance sampling. Example queries include predicting the $n^\text{th}$ event type in a sequence and the hitting time distribution of one or more event types. We also leverage these findings further to be applicable for estimating general "$A$ before $B$" type of queries. We prove theoretically that our estimation method is effectively always better than naive simulation and show empirically based on three real-world datasets that it is on average 1,000 times more efficient than existing approaches.
翻译:连续时间事件序列,即由连续时间印花和相关事件类型(“标记”)组成的连续时间事件序列,是许多应用,例如临床医学或用户行为模型中的重要序列数据类型。由于这些数据典型是自动递增的模型(例如使用神经螺旋桨过程或其古典对等程序),因此自然会问关于未来情景的问题,如“下一个事件将发生何种类型的事件”或“在某类B美元之前将发生一笔美元类型的事件”。 不幸的是,其中一些询问是臭名昭著的难以解决的,因为目前的方法仅限于天真模拟,而这种模拟可能效率极低。本文介绍了新的查询类型类型类型和使用重要抽样处理这些数据的框架。示例询问包括按顺序预测美元-Text{th}事件类型和一种或多种事件类型的冲击时间分布。我们还利用这些调查结果进一步适用于估计一般“美元先于美元”类查询。我们从理论上证明,我们的估算方法总是比天真模拟有效,并且根据现有3次实际数据设定的方法更符合经验性。