Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient $\gamma$, i.e. the limit of the normalised LCS length for two random binary strings of length $n \to \infty$, is a very natural problem, first posed by Chv\'atal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant $\gamma$ to the parameters of a certain stochastic particle process. These parameters are determined by a specific (large) system of polynomial equations with integer coefficients, which implies that $\gamma$ is an algebraic number. Short of finding an exact closed-form solution for such a polynomial system, which appears to be unlikely, our approach essentially resolves the Chv\'atal-Sankoff problem, albeit in a somewhat unexpected way with a rather negative flavour.
翻译:在两个同样长、统一随机的二进制字符串中,其最长共同子序列(LCS)的预期长度与字符串长度成比例。找到比例系数 $\gamma$,即两个随机长度($n\ to\infty$)的正常的LCS长度,这是一个非常自然的问题,首先由Chv\'atal和Sankoff在1975年提出,目前尚未解决。这个问题与不同领域有关,从组合法和算法分析到编码理论和计算生物学。利用统计机学方法以及LCS组合结构的某些现有结果,我们把常价$\gamma$与某些随机粒子过程的参数联系起来。这些参数是由一个特定(大)的具有整数系数的多元方程式系统决定的,这意味着$\gamma$是一个代数。除了为这种多元系统找到一个精确的封闭式解决方案之外,还差一些问题,尽管我们的方法似乎不太有可能解决,但实际上还是有点困难。