A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are naturally fascinated by claims with long durability, such as: "On January 22, 2006, Kobe Bryant dropped 81 points against Toronto Raptors. Since then, this scoring record has yet to be broken." In general, given a sequence of instant-stamped records, suppose that we can rank them by a user-specified scoring function $f$, which may consider multiple attributes of a record to compute a single score for ranking. This paper studies "durable top-$k$ queries", which find records whose scores were within top-$k$ among those records within a "durability window" of given length, e.g., a 10-year window starting/ending at the timestamp of the record. The parameter $k$, the length of the durability window, and parameters of the scoring function (which capture user preference) can all be given at the query time. We illustrate why this problem formulation yields more meaningful answers in some practical situations than other similar types of queries considered previously. We propose new algorithms for solving this problem, and provide a comprehensive theoretical analysis on the complexities of the problem itself and of our algorithms. Our algorithms vastly outperform various baselines (by up to two orders of magnitude on real and synthetic datasets).
翻译:从即时公布的时间数据中找到有趣或特殊记录的一种方法,就是考虑它们的“可变性”或者直观地说,它们与其他早晚到达的记录相比,它们与其他记录相比有多好,以及它们保持至高无上地位的时间长短。例如,人们自然被具有长期耐久性的要求所吸引,例如:“2006年1月22日,神户布赖恩特对多伦多猛禽队投下了81分。从那时以来,这一评分记录尚未破碎。”一般而言,根据瞬时印的记录的顺序,假设我们可以用用户指定的评分函数$f$($f)排列它们,这可能考虑记录中的多个属性来计算单分数。这一论文研究“可高至1美元查询 ”, 发现这些记录在一定长的“可变性窗口” 中得分数在最高1美元之内, 例如, 从那时起,这个10年的窗口开始/结束于记录的时间标记。 参数 $k$$($k$), 耐久窗口的长度, 和评分数函数的参数(能捕捉到用户偏好一分) 单的多重评分分数。我们之前的算, 在不同的算中, 我们用新的算中, 提供了更有意义的解算法式的答案。我们这个问题, 我们用新的解的解的答案, 我们用在不同的解。