Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compressor that compresses the string to $\tau$ bits, performs $q=O(\tau)$ substring queries; this algorithm, however, runs in exponential time. For this reason, the second part of the paper focuses on more time-efficient algorithms whose number of queries is bounded by specific compressibility measures. We first show that any string of length $n$ over an integer alphabet of size $\sigma$ with $rle$ runs can be reconstructed with $q=O(rle (\sigma + \log \frac{n}{rle}))$ substring queries in linear time and space. We then present an algorithm that spends $q \in O(\sigma g\log n)$ substring queries and runs in $O(n(\log n + \log \sigma)+ q)$ time using linear space, where $g$ is the size of a smallest straight-line program generating the string.
翻译:假设一个( oracle) 知道一个我们不知道的字符串 $S$, 并且我们想要确定。 sacle 可以回答“ $s$ 是美元 的子字符串” 的询问 。 1995年, Skiena 和 Sundaram 显示, 在最坏的情况下, 任何算法都需要询问 orcle $\ sgma n/4 - O( n) 查询才能重建隐藏的字符串, $\ gma $ 的字母大小是 $S$ 和 美元 的长度, 并给出一个将 $( q) 字串缩成 $( O) 的字符串大小, 并给出一个( $O) + 美元 的算法, 然而, 这个算法是以 Q$ 美元 的直径计值 。 因此, 我们的直径直径计算法 中, 将一个特定的直径序号 用于 $ 。