Massive samples of event sequences data occur in various domains, including e-commerce, healthcare, and finance. There are two main challenges regarding inference of such data: computational and methodological. The amount of available data and the length of event sequences per client are typically large, thus it requires long-term modelling. Moreover, this data is often sparse and non-uniform, making classic approaches for time series processing inapplicable. Existing solutions include recurrent and transformer architectures in such cases. To allow continuous time, the authors introduce specific parametric intensity functions defined at each moment on top of existing models. Due to the parametric nature, these intensities represent only a limited class of event sequences. We propose the COTIC method based on a continuous convolution neural network suitable for non-uniform occurrence of events in time. In COTIC, dilations and multi-layer architecture efficiently handle dependencies between events. Furthermore, the model provides general intensity dynamics in continuous time - including self-excitement encountered in practice. The COTIC model outperforms existing approaches on majority of the considered datasets, producing embeddings for an event sequence that can be used to solve downstream tasks - e.g. predicting next event type and return time. The code of the proposed method can be found in the GitHub repository (https://github.com/VladislavZh/COTIC).
翻译:大量事件序列数据样本出现在各个领域,包括电子商务、保健和金融领域。这些数据的推断存在两个主要挑战:计算和方法。每个客户可用数据的数量和事件序列长度通常很大,因此需要长期建模。此外,这些数据往往稀少且不统一,使时间序列处理的经典方法无法适用。现有解决方案包括这类情况下的经常性和变压器结构。为了允许持续时间,作者在现有模型上方的每个时刻都引入特定的参数强度功能。由于参数性质,这些强度仅代表有限的事件序列类别。我们建议基于连续的相动神经网络的COTIC方法,适合不统一的时时发生事件。在COTIC、配方和多层结构中,可以有效地处理不同事件之间的不同情况。此外,模型提供连续时间的一般强度动态,包括实践中遇到的自我激发。COTIC模型在所考虑的多数数据设置上比现有的方法差,这些强度只代表有限的事件序列。我们建议采用基于连续的相动神经网络的方法,适合不统一的事件发生。在COTIC、变相和多层结构结构中,可以找到一个排序中所使用的方法。在Giub/Crevodal 中可以找到的返回。