使用 PPR PPR Martingale 信心序列进行 PAC 模式估计 (PAC Mode Estimation using PPR Martingale Confidence Sequences)

Shubham Anand Jain,Rohan Shah,Sanit Gupta,Denil Mehta,Inderjeet Jayakumar Nair,Jian Vora,Sushil Khyalia,Sourav Das,Vinay J. Ribeiro,Shivaram Kalyanakrishnan

from arxiv, 30 pages, 2 figures

We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn according to $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. Noting the efficiency of prior-posterior-ratio (PPR) martingale confidence sequences for handling this special case, we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. We observe that the "one-versus-one" principle yields a more efficient generalisation than the "one-versus-rest" alternative. Our resulting stopping rule, denoted PPR-ME, is optimal in its sample complexity up to a logarithmic factor. Moreover, PPR-ME empirically outperforms several other competing approaches for mode estimation. We demonstrate the gains offered by PPR-ME in two practical applications: (1) sample-based forecasting of the winner in indirect election systems, and (2) efficient verification of smart contracts in permissionless blockchains.

翻译：我们考虑了正确识别离散分布模式$\mathcal{P} $(mathcal{P}) 和足够高的概率问题,通过观察按$$(mathcal{P}$) 提取的i.d.d. 样本序列,来正确识别离散分布模式 $\mathcal{P} 美元(mathcal{P}) 的方式。当$(mathcal{P} $) 拥有一套规模为K=2美元的支持时,这一问题会降低到对单一参数的估计。我们注意到,在处理这一特殊案例时,先质(PPR) marting- 信任序列的效率,我们建议对模式估算采用一般化方法,其中$(mathcardcal cal{P} $(Geq 2) $(美元) 。我们观察到,“一反向一” 原则比“ 单面值” 原则产生比“ 单面值” 替代参数更高效的概括性。我们由此得出的停止规则( dedead descrime- PR-ME) ME) 最优于一个对一个对一个逻辑因素的精选的精选系统进行精准性预测。此外的精选。此外,我们用其他几种方法比其他几种方法也展示了其他几种方法。我们展示了两种实际应用中PPR-ME- 。我们展示了两种PPR-ME- 。