In the Shortest Common Superstring problem (SCS), one needs to find the shortest superstring for a set of strings. While SCS is NP-hard and MAX-SNP-hard, the Greedy Algorithm "choose two strings with the largest overlap; merge them; repeat" achieves a constant factor approximation that is known to be at most 3.5 and conjectured to be equal to 2. The Greedy Algorithm is not deterministic, so its instantiations with different tie-breaking rules may have different approximation factors. In this paper, we show that it is not the case: all factors are equal. To prove this, we show how to transform a set of strings so that all overlaps are different whereas their ratios stay roughly the same. We also reveal connections between the original version of SCS and the following one: find a~superstring minimizing the number of occurrences of a given symbol. It turns out that the latter problem is equivalent to the original one.
翻译:在最短常见的超级字符串问题(SCS)中,人们需要找到一组字符串中最短的超级字符串。虽然 SCS 是 NP-hard 和 MAX-SNP-hard, 但贪婪的 Algorithm “ 选择两个字符串与最大重叠; 合并; 重复 ” 能够实现一个常数系数近似值, 已知该值最多为3.5, 并被推断为等于 2. 贪婪的 ALgorithm 不具有确定性, 因此它与不同断结规则的即时反应可能有不同的近似因素。 在本文中, 我们显示情况并非如此: 所有因素都是平等的。 为了证明这一点, 我们展示如何转换一组字符串, 以便所有重叠都不同, 而它们的比例大致相同 。 我们还揭示了 SCSCS 原版和以下符号之间的联系: 找到一个 ~ 超级最小化的最小化一个符号发生次数。 。 事实证明, 后一个问题相当于原符号 。