Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard.
翻译:在基因组组组装的脚架阶段(Schrinner等人,WABI 2020)最近引入了一个长期运行子序列问题。 问题要求给每个符号最多一个运行的字符串的最大长度子序列( 运行是连续相同符号的最大子字符串) 。 当参数为输入字符串定义的字母大小时, 问题被证明是NP硬的, 并且可以被固定参数牵引。 在本文中, 我们进一步调查问题的复杂性, 并且我们显示, 当每个符号以一个解决方案中的运行次数、 一个较小参数作为参数的参数时, 它是固定参数可移动的。 此外, 我们调查了长运行子序列的内脏复杂性, 并且我们证明, 当以字母大小或运行次数作为参数参数时, 它不接受一个多核内核内核内核。 最后, 当每个符号在输入字符串中最多两次发生时, 我们考虑对长运行子序列的限制, 我们显示它是 APX- hard 。