弦压缩机的灵敏度和重复性措施 (Sensitivity of string compressors and repetitiveness measures)

The sensitivity of a string compression algorithm $C$ asks how much the output size $C(T)$ for an input string $T$ can increase when a single character edit operation is performed on $T$. This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string. In this paper, we analyze the worst-case multiplicative sensitivity of string compression algorithms, defined by $\max_{T \in \Sigma^n}\{C(T')/C(T) : ed(T, T') = 1\}$, where $ed(T, T')$ denotes the edit distance between $T$ and $T'$. For the most common versions of the Lempel-Ziv 77 compressors, we prove that the worst-case multiplicative sensitivity is only a small constant (2 or 3, depending on the version of the Lempel-Ziv 77 and the edit operation type). We strengthen our upper bound results by presenting matching lower bounds on the worst-case sensitivity for all these major versions of the Lempel-Ziv 77 factorizations. This contrasts with the previously known related results such that the size $z_{\rm 78}$ of the Lempel-Ziv 78 factorization can increase by a factor of $\Omega(n^{1/4})$ [Lagarde and Perifel, 2018], and the number $r$ of runs in the Burrows-Wheeler transform can increase by a factor of $\Omega(\log n)$ [Giuliani et al., 2021] when a character is prepended to an input string of length $n$. We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS, and CDAWG. Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size $\gamma$ and the substring complexity $\delta$. We present some non-trivial upper and lower bounds of the worst-case multiplicative sensitivity for $\gamma$ and matching upper and lower bounds of the worst-case multiplicative sensitivity for $\delta$.

翻译：字符串压缩算法的灵敏度 $C 询问输入字符串的输出大小 $C(T) $T $T 当一个字符编辑操作用$T美元时,T$能增加多少。这个概念使一个人能够用输入字符串中的错误和(或)动态变化来测量压缩算法的稳健性。在本文件中,我们分析了字符压缩算法最差的多倍性敏感度,由 $maxT\in\Sgmax%C(T') /C(T) : 编辑(T,T') 最低的美元美元 $T=1 美元。美元(T,T') 表示美元和$T$的编辑距离。对于最常见的 Lempel-Ziv 77 压缩算法, 我们分析的是最差的多倍的重复性灵敏度, 取决于 lempel- Ziv 77 和编辑操作类型。我们通过显示最差的最差的调调调调调调调的调调调, 美元和最差的调的调的调的调的调的调的调的调的调的调的调的调的调的调的调值的调的调值的调值的调制的调制的调和调制的调制的调制的调制的调制的调制的调制的调制的调制的调的调的调的调的调的调的调的调的调的调的调的调的调的调的调的调制的调制的调制的调制的调制的调制的调制的调制的调制的调的调制的调制的调的调的调的调的调的调的调的调的调制的调制的调制的调制的调的调的调的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调的调的调的调的调的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的