Recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving continuous-time event sequences (CTESs). However, the retrieval problem of such sequences remains largely unaddressed in literature. To tackle this, we propose NEUROSEQRET which learns to retrieve and rank a relevant set of continuous-time event sequences for a given query sequence, from a large corpus of sequences. More specifically, NEUROSEQRET first applies a trainable unwarping function on the query sequence, which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP guided neural relevance models. We develop two variants of the relevance model which offer a tradeoff between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing leading to a significant speedup in returning top-K results for a given query sequence. Our experiments with several datasets show the significant accuracy boost of NEUROSEQRET beyond several baselines, as well as the efficacy of our hashing mechanism.
翻译:使用标志性时间点进程(MTP)的预测模型的近期发展使得对涉及连续时间事件序列(CTES)的若干现实世界应用程序的准确定性成为了准确的描述。然而,文献中仍然基本上没有解决这些序列的检索问题。为了解决这个问题,我们提议NEUROSEQRET,它学习从大量序列中检索和排序一个特定查询序列的相关连续时间序列。更具体地说,NEUROSEQRET首先在查询序列上应用一个可训练的不可辩驳功能,使该功能与元素序列具有可比性,特别是当一个相关的查询-体对配有个别不同属性时。接下来,它将非对的查询序列和物理序列输入到MTPPP的导导导神经相关性模型中。我们开发了两个相关模型的变式,在精确度和效率之间作出权衡。我们还提议了一个优化框架,以学习从相关分数中嵌入的二进制序列,适合地点敏感度,从而导致在返回给定的查询序列的顶级结果方面大大加快速度。我们用几个数据集进行的实验显示我们作为NEURQ基准的显著的精度。