The existing algorithms for processing skyline queries cannot adapt to big data. This paper proposes two approximate skyline algorithms based on sampling. The first algorithm obtains a fixed size sample and computes the approximate skyline on the sample. The error of the first algorithm is relatively small in most cases, and is almost independent of the input relation size. The second algorithm returns an $(\epsilon,\delta)$-approximation for the exact skyline. The size of sample required by the second algorithm can be regarded as a constant relative to the input relation size, so is the running time. Experiments verify the error analysis of the first algorithm and show that the second algorithm is much faster than the existing skyline algorithms.
翻译:处理天线查询的现有算法无法适应大数据 。 本文基于抽样提出两种大致的天线算法 。 第一个算法获得了固定的大小样本, 并在样本中计算了近似天空线 。 第一个算法的错误在多数情况下相对较小, 几乎与输入关系大小无关 。 第二个算法返回精确的天线的 $ (\ epsilon,\delta)$- appolor mation 。 第二个算法所需的样本大小可以被视为与输入关系大小相对的恒定值, 运行时间也是如此 。 实验可以验证第一个算法的错误分析, 并显示第二个算法比现有的天线算法要快得多 。