Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a linearly ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most $\varepsilon$. That is, an $\varepsilon$-approximate quantile summary first processes a stream of items and then, given any quantile query $0\le \phi\le 1$, returns an item from the stream, which is a $\phi'$-quantile for some $\phi' = \phi \pm \varepsilon$. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, by Greenwald and Khanna (ACM SIGMOD '01), stores at most $O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)$ items, where $N$ is the number of items in the stream. We prove that this space bound is optimal by providing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space $f(\varepsilon)\cdot o(\log N)$, for any function $f$ that does not depend on $N$. A consequence of our results is also to show a lower bound for randomized algorithms.
翻译:中位数或百分位数等量, 提供关于从线性订购的宇宙中提取的物品集分布的简明和有用信息。 我们研究数据结构, 叫做 量数摘要, 跟踪所有量, 直至一个错误, 最多$\ varepsilon$。 也就是说, $\ varepsilon$- 近似四分量摘要首先处理一个项目流, 然后根据任何量性查询 $0\le\ file 1 美元, 返回流中的一个可靠项目, 即 $\\\ fu' $- quantile 用于某些 $\\\ phi\ pm\ varepsilon$。 我们关注基于比较的量摘要, 只能比较两个项目, 否则完全忽略宇宙。 最好的确定性孔数摘要, 由Greenwald 和 Khanna (ACM SIGMDOD) 以最高值存储, $( {\\\\\ varep $ qual $_ dolfrational) a rudeal rudeal rudeal ex ex a rode rodeal sublex rudeal ex ex ex exual ex the we pres pres pres presualut rublegelpalut ex a rublemental $ Ns) ex expalus a rublement a.