In the Categorical Clustering problem, we are given a set of vectors (matrix) A={a_1,\ldots,a_n} over \Sigma^m, where \Sigma is a finite alphabet, and integers k and B. The task is to partition A into k clusters such that the median objective of the clustering in the Hamming norm is at most B. That is, we seek a partition {I_1,\ldots,I_k} of {1,\ldots,n} and vectors c_1,\ldots,c_k\in\Sigma^m such that \sum_{i=1}^k\sum_{j\in I_i}d_h(c_i,a_j)\leq B, where d_H(a,b) is the Hamming distance between vectors a and b. Fomin, Golovach, and Panolan [ICALP 2018] proved that the problem is fixed-parameter tractable (for binary case \Sigma={0,1}) by giving an algorithm that solves the problem in time 2^{O(B\log B)} (mn)^{O(1)}. We extend this algorithmic result to a popular capacitated clustering model, where in addition the sizes of the clusters should satisfy certain constraints. More precisely, in Capacitated Clustering, in addition, we are given two non-negative integers p and q, and seek a clustering with p\leq |I_i|\leq q for all i\in{1,\ldots,k}. Our main theorem is that Capacitated Clustering is solvable in time 2^{O(B\log B)}|\Sigma|^B(mn)^{O(1)}. The theorem not only extends the previous algorithmic results to a significantly more general model, it also implies algorithms for several other variants of Categorical Clustering with constraints on cluster sizes.
翻译:在星系群集问题中, 我们得到一组矢量 {I_ 1,\ldots, a_ n} 超过\ Sgma ⁇ m, 其中\Sgma ⁇ 是一个有限的字母, 和整数 k和B。 任务在于将 A 分割成 k团, 这样在 Hamming 规范中组的中位目标最多为 B。 也就是说, 我们寻找一个分区 {I_ 1,\ldots, I_ k} 和 lidal {1,\ldots, n} 和 c_ldaldots, c_ldald\ saldots, c_ kn\Sgma} 范围, 其中\ sumlid=1\k\ sum_j\in I_i} d_h( i,a_j)\\\leq) B, 其中, d_H(a, b) 是矢量 和 b. formin, Golmal 之间的距离。 Golovach, 其中的解算算为 。