We study fundamental clustering problems for incomplete data. In this setting, we are given a set of incomplete d-dimensional Boolean vectors (representing the rows of a matrix), and the goal is to complete the missing vector entries so that the set of complete vectors admits a partitioning into at most k clusters with radius or diameter at most r. We develop a toolkit and use it to give tight characterizations of the parameterized complexity of these problems with respect to the parameters k, r, and the minimum number of rows and columns needed to cover all the missing entries. We show that the aforementioned problems are fixed-parameter tractable when parameterized by the three parameters combined, and that dropping any of these three parameters results in parameterized intractability. We extend this toolkit to settle the parameterized complexity of other clustering problems, answering an open question along the way. We also show how our results can be extended to data over any constant-size domain. A byproduct of our results is that, for the complete data setting, all problems under consideration are fixed-parameter tractable parameterized by k+r.
翻译:我们研究数据不完整的基本组群问题。 在这种背景下, 我们得到一组不完整的维维布林矢量( 代表矩阵的行), 目标是完成缺失的矢量条目, 这样完整的矢量组能够接受最多以半径或直径为单位的最多 k 组群的分隔。 我们开发了一个工具包, 并用它来对这些参数 k、 r 和覆盖所有缺失条目所需的最小行数和列数的参数复杂性进行严格的定性。 我们显示, 这三个参数结合参数时, 上述问题是可以固定的参数, 并且 丢弃这三个参数中的任何参数都会导致参数的吸引力。 我们扩展该工具包, 以解决其他组群问题的参数复杂性, 沿着路径回答一个开放的问题 。 我们还展示了我们的结果可以如何扩展到任何恒定大小域的数据。 我们结果的副产品是, 在完整数据设置时, 所考虑的所有问题都由 k+r 设定固定的参数 。