Motivated by applications in instance selection, we introduce the \emph{star discrepancy subset selection problem}, which consists of finding a subset of \(m\) out of \(n\) points that minimizes the star discrepancy. We introduce two mixed integer linear formulations (MILP) and a combinatorial branch-and-bound (BB) algorithm for this problem and we evaluate our approaches against random subset selection and a greedy construction on different use-cases in dimension two and three. Our results show that one of the MILPs and BB are efficient in dimension two for large and small $m/n$ ratio, respectively, and for not too large $n$. However, the performance of both approaches decays strongly for larger dimensions and set sizes. As a side effect of our empirical comparisons we obtain point sets of discrepancy values that are much smaller than those of common low-discrepancy sequences, random point sets, and of Latin Hypercube Sampling. This suggests that subset selection could be an interesting approach for generating point sets of small discrepancy value.
翻译:由实例选择中的应用程序驱动, 我们引入了 emph{ star 差异子集选择问题, 包括从\ (n\) 点中找到 \ (m\) 子集, 以最小化恒星差异。 我们为此引入了两种混合整线配方( MILP) 和组合分支和约束( BB) 算法, 我们根据随机子集选择和对第二和三维不同使用案例的贪婪构造来评估我们的方法。 我们的结果表明, MILP 和 BB 中的一种在二维中分别对大和小 美元/ 美元比率有效, 而不是太高 美元 。 然而, 两种方法的性能在更大的尺寸和设定大小上都严重衰减 。 作为我们实验性比较的副作用, 我们获得的点差异值比常见的低差异序列、 随机点集和拉丁超立方点取样的点值要小得多。 这表明, 子选择可能是产生小差异点值的有趣方法 。