Motivated by applications in instance selection, we introduce the star discrepancy subset selection problem, which consists of finding a subset of m out of n points that minimizes the star discrepancy. First, we show that this problem is NP-hard. Then, we introduce a mixed integer linear formulation (MILP) and a combinatorial branch-and-bound (BB) algorithm for the star discrepancy subset selection problem and we evaluate both approaches against random subset selection and a greedy construction on different use-cases in dimension two and three. Our results show that the MILP and BB are efficient in dimension two for large and small $m/n$ ratio, respectively, and for not too large n. However, the performance of both approaches decays strongly for larger dimensions and set sizes. As a side effect of our empirical comparisons we obtain point sets of discrepancy values that are much smaller than those of common low-discrepancy sequences, random point sets, and of Latin Hypercube Sampling. This suggests that subset selection could be an interesting approach for generating point sets of small discrepancy value.
翻译:我们引入了恒星差异子集选择问题, 包括从 n点中找到一个 m 子集, 从而将恒星差异最小化。 首先, 我们显示这个问题是NP- 硬的。 然后, 我们为恒星差异子集选择问题引入了混合整数线性配方( MILP) 和组合分支和约束( BB) 算法, 我们根据随机子集选择和对第二和三维不同使用案例的贪婪构造来评估两种方法。 我们的结果表明, MILP 和 BB 在第二维度上效率很高, 分别为大和小, 美/ 美/ 美/ 美/ 美。 但是, 这两种方法的性能在更大的维度和设定大小上都严重衰败。 作为我们实验性比较的副作用, 我们获得了比常见的低差异序列、 随机点数和拉丁超立方 抽样 的点值要小得多。 这说明, 子集选择可能是产生小点差异值的有趣方法 。