This paper proposes a multilevel sampling algorithm for fiber sampling problems in algebraic statistics, inspired by Henry Wynn's suggestion to adapt multilevel Monte Carlo (MLMC) ideas to discrete models. Focusing on log-linear models, we sample from high-dimensional lattice fibers defined by algebraic constraints. Building on Markov basis methods and results from Diaconis and Sturmfels, our algorithm uses variable step sizes to accelerate exploration and reduce the need for long burn-in. We introduce a novel Fiber Coverage Score (FCS) based on Voronoi partitioning to assess sample quality, and highlight the utility of the Maximum Mean Discrepancy (MMD) quality metric. Simulations on benchmark fibers show that multilevel sampling outperforms naive MCMC approaches. Our results demonstrate that multilevel methods, when properly applied, provide practical benefits for discrete sampling in algebraic statistics.
翻译:本文提出了一种用于代数统计中纤维抽样问题的多级抽样算法,其灵感来源于Henry Wynn关于将多级蒙特卡洛(MLMC)思想应用于离散模型的建议。聚焦于对数线性模型,我们从由代数约束定义的高维格点纤维中进行抽样。基于马尔可夫基方法以及Diaconis和Sturmfels的研究成果,我们的算法采用可变步长来加速探索并减少对长预热期的需求。我们引入了一种基于Voronoi划分的新型纤维覆盖分数(FCS)来评估样本质量,并强调了最大均值差异(MMD)质量度量指标的实用性。在基准纤维上的模拟实验表明,多级抽样方法优于朴素的马尔可夫链蒙特卡洛(MCMC)方法。我们的研究结果证明,当正确应用时,多级方法为代数统计中的离散抽样提供了实际效益。