A determinantal point process (DPP) is an elegant model that assigns a probability to every subset of a collection of $n$ items. While conventionally a DPP is parameterized by a symmetric kernel matrix, removing this symmetry constraint, resulting in nonsymmetric DPPs (NDPPs), leads to significant improvements in modeling power and predictive performance. Recent work has studied an approximate Markov chain Monte Carlo (MCMC) sampling algorithm for NDPPs restricted to size-$k$ subsets (called $k$-NDPPs). However, the runtime of this approach is quadratic in $n$, making it infeasible for large-scale settings. In this work, we develop a scalable MCMC sampling algorithm for $k$-NDPPs with low-rank kernels, thus enabling runtime that is sublinear in $n$. Our method is based on a state-of-the-art NDPP rejection sampling algorithm, which we enhance with a novel approach for efficiently constructing the proposal distribution. Furthermore, we extend our scalable $k$-NDPP sampling algorithm to NDPPs without size constraints. Our resulting sampling method has polynomial time complexity in the rank of the kernel, while the existing approach has runtime that is exponential in the rank. With both a theoretical analysis and experiments on real-world datasets, we verify that our scalable approximate sampling algorithms are orders of magnitude faster than existing sampling approaches for $k$-NDPPs and NDPPs.
翻译:决定性点进程(DPP)是一个优雅的模式,它给每组美元收藏物品的每个子集都分配了概率。虽然在传统意义上,DPP是由一个对称内核矩阵参数参数化的,但消除了这种对称限制,导致对称DPP(NDPs),从而大大改进了建模力和预测性能。最近的工作研究了一个大约Markov链 Monte Carlo(MMC)的NPP抽样算法,该算法仅限于规模-k美元子集(称为美元-NDPPs)。然而,这一方法的运行时间是以美元为单位的四倍,使得它无法适用于大型环境。在这项工作中,我们为美元和低层的NDPPs制定了一个可缩放的MC MC 采样算法,从而使得以美元为单位的运行时间线性运行时间以美元为单位。我们的方法基于一种最先进的NDPPPS采样算法,用新的方法来高效地构建提案的分布。此外,我们用真实的基数级的基价序列算法的基数比比了我们现在的基级的基级的基级的基级的基级的基的基数分析是基数,而没有基数的基数的基数的基数的基数级的基数。