Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most $\varepsilon$. That is, an $\varepsilon$-approximate quantile summary first processes a stream of items and then, given any quantile query $0\le \phi\le 1$, returns an item from the stream, which is a $\phi'$-quantile for some $\phi' = \phi \pm \varepsilon$. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most $O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)$ items, where $N$ is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space $f(\varepsilon)\cdot o(\log N)$, for any function $f$ that does not depend on $N$. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of $(1\pm \varepsilon)\cdot \phi$, and for other related computational tasks.
翻译:量子摘要, 如中位数或百分位数, 提供关于从完全有序的宇宙中提取的物品集分布的简明而有用的信息。 我们研究数据结构, 叫做量子摘要, 跟踪所有量的量子, 直至一个错误, 最多 $\ varepsilon$。 也就是说, $\ varepsilon$- 近似量孔径摘要首先处理一个项目流, 然后根据任何量级查询 $0\le\ file 1, 从流中返回一个项目, 即 $$ 的量- 量- 美元- 美元- 美元- 一种美元- 一种美元- 美元- 美元- 一种美元- 美元- 美元- 的量- 以比较为基础的量- 基数- 美元- 以比值- 美元- 美元- 的量- 基数- 以比值- 美元- 美元- 以比值- 美元- 美元- 美元- 以比值- 美元- 美元- 美元- 美元- 美元- 平流的量- 等的量- 等的量- 等的量- 计算结果- 以比值- 以比值- 美元- 美元- 美元- 等- 等- 等- 的量- 等- 等- 等- 等- 的量- 的量- 的量- 的量- 的量- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等- 等-