We study the task of determinant maximization under partition constraint, in the context of large data sets. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,..., V_s$, and integers $k_1,...,k_s$ where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization and its constrained variants have gained a lot of interest for modeling diversityand have found applications in the context of fairness and data summarization. We study the design of composable coresets for the constrained determinant maximization problem. Composable coresets are small subsets of the data that (approximately) preserve optimal solutions to optimization tasks and enable efficient solutions in several other large data models including the distributed and the streaming settings. In this work, we consider two regimes. For the case of $k>d$, we show a peeling algorithm that gives us a composable coreset of size $kd$ with an approximation factor of $d^{O(d)}$. We complement our results by showing that this approximation factor is tight. For the case of $k\leq d$, we show that a simple modification of the previous algorithms results in an optimal coreset verified by our lower bounds. Our results apply to all strongly Rayleigh distribution and several other experimental design problems. In addition, we show coreset construction algorithms under the more general laminar matroid constraints.
翻译:在大型数据集的背景下,我们研究在分区限制下最大最大化的决定因素任务。 以一个点集 $V\ subset\ mathbb{R ⁇ d$, 分成美元组 $1,..., V_s$, 和整数 $k_ 1,..., k_s$, 在美元组中, 我们的目标是从美元组中挑选 $k_ i 点, 以便最大限度地确定所选的美元点的总决定因素。 确定性最大化及其受限变异体对构建多样性产生了很大的兴趣, 在公平和数据组合化的背景下, 找到了各种应用。 我们研究了可折数核心堆的设计, 以受限决定性决定性决定性的问题。 组合式核心集是数据的小组, (左右) 保留优化任务的最佳解决方案, 并在其他几个大数据模型中, 包括分布式和流动设置。 我们考虑两种制度。 在 美元组案例中, 我们展示了一种更精确的算法, 显示一个更精确的精确的计算方法, 显示我们这个精确的缩数的缩缩数 的缩数 。