关于流中伪确定性计数的下界 (Lower Bounds for Pseudo-Deterministic Counting in a Stream)

Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed simultaneously, to avoid an exponentially larger (and often trivial) space complexity. A common drawback of these randomized approximate algorithms is that independent executions on the same input have different outputs, that depend on their random coins. Pseudo-deterministic algorithms combat this issue, and for every input, they output with high probability the same ``canonical'' solution. We consider perhaps the most basic problem in data streams, of counting the number of items in a stream of length at most $n$. Morris's counter [CACM, 1978] is a randomized approximation algorithm for this problem that uses $O(\log\log n)$ bits of space, for every fixed approximation factor (greater than $1$). Goldwasser, Grossman, Mohanty and Woodruff [ITCS 2020] asked whether pseudo-deterministic approximation algorithms can match this space complexity. Our main result answers their question negatively, and shows that such algorithms must use $\Omega(\sqrt{\log n / \log\log n})$ bits of space. Our approach is based on a problem that we call Shift Finding, and may be of independent interest. In this problem, one has query access to a shifted version of a known string $F\in\{0,1\}^{3n}$, which is guaranteed to start with $n$ zeros and end with $n$ ones, and the goal is to find the unknown shift using a small number of queries. We provide for this problem an algorithm that uses $O(\sqrt{n})$ queries. It remains open whether $poly(\log n)$ queries suffice; if true, then our techniques immediately imply a nearly-tight $\Omega(\log n/\log\log n)$ space bound for pseudo-deterministic approximate counting.

翻译：许多流式算法只提供高概率的相对近似值。这两种放宽——允许近似和随机化——似乎是必要的。对于许多流式问题，这两种放宽必须同时使用，以避免指数级更大（而且通常是微不足道）的空间复杂度。这些随机近似算法的一个普遍缺点是，相同输入的独立执行具有不同的输出，这取决于它们的随机金币。伪确定性算法解决了这个问题，对于每个输入，它们以高概率输出相同的“规范”解决方案。我们考虑数据流中可能是最基本的问题：计算流的项数，其长度不超过 $n$。Morris's 计数器 [CACM，1978] 是一个用于此问题的随机近似算法，对于每个固定的近似因子（大于 $1$），使用 $O(\log\log n)$ 位空间。Goldwasser、Grossman、Mohanty 和 Woodruff [ITCS 2020] 询问伪确定性近似算法是否可以匹配此空间复杂度。我们的主要结果否定了他们的问题，并表明这样的算法必须使用 $ \Omega(\sqrt{\log n / \log\log n})$ 位空间。我们的方法基于一个问题，我们称之为 Shift Finding，并且可能具有独立的兴趣。在这个问题中，人们可以访问一个已知字符串 $F\in\{0,1\}^{3n}$ 的移位版本，它保证以 $n$ 个零开头和以 $n$ 个一结尾，目标是使用少量的查询找到未知的移位。我们为此问题提供了一个使用 $O(\sqrt{n})$ 查询的算法。是否存在 $poly(\log n)$ 查询仍然是未知的；如果是真的，那么我们的技术会立即导致一个几乎紧密的 $\Omega(\log n/ \log\log n)$ 空间界，适用于伪确定性近似计数。