We address the problem of self-supervised learning on discrete event sequences generated by real-world users. Self-supervised learning incorporates complex information from the raw data in low-dimensional fixed-length vector representations that could be easily applied in various downstream machine learning tasks. In this paper, we propose a new method "CoLES", which adapts contrastive learning, previously used for audio and computer vision domains, to the discrete event sequences domain in a self-supervised setting. We deployed CoLES embeddings based on sequences of transactions at the large European financial services company. Usage of CoLES embeddings significantly improves the performance of the pre-existing models on downstream tasks and produces significant financial gains, measured in hundreds of millions of dollars yearly. We also evaluated CoLES on several public event sequences datasets and showed that CoLES representations consistently outperform other methods on different downstream tasks.
翻译:我们解决了在现实世界用户生成的离散事件序列上进行自我监督学习的问题。自监督学习纳入了低维固定长度矢量显示的原始数据中的复杂信息,这些原始数据可以很容易地应用于各种下游机器学习任务。在本文中,我们提出了一种新的方法“ColLES ”,将以前用于音频和计算机视觉领域的对比学习适应到自监督环境中的离散事件序列域。我们根据大型欧洲金融服务公司的交易序列部署了ColLES嵌入器。COLES 嵌入器的使用大大改进了现有模式在下游任务上的性能,并产生大量财政收益,每年以数亿美元衡量。我们还评估了多个公共事件序列数据集中的CLES,并表明CLES在不同的下游任务上一贯地优于其他方法。