There is a growing interest in the estimation of the number of unseen features, mostly driven by applications in biological sciences. A recent work brought out the upside and the downside of the popular stable-Beta process prior, and generalizations thereof, in Bayesian nonparametric inference for the unseen-features problem: i) the downside lies in the limited use of the sampling information in the posterior distributions, which depend on the observable sample only through the sample size; ii) the upside lies in the analytical tractability and interpretability of the posterior distributions, which are simple Poisson distributions whose parameters are simple to compute, and depend on the sample size and the prior's parameter. In this paper, we introduce and investigate an alternative nonparametric prior, referred to as the stable-Beta scaled process prior, which is the first prior that allows to enrich the posterior distribution of the number of unseen features, through the inclusion of the sampling information on the number of distinct features in the observable sample, while maintaining the same analytical tractability and interpretability as the stable-Beta process prior. Our prior leads to a negative Binomial posterior distribution, whose parameters depends on the sample size, the observed number of distinct features and the prior's parameter, providing estimates that are simple, linear in the sampling information and computationally efficient. We apply our approach to synthetic and real genetic data, showing that it outperforms parametric and nonparametric competitors in terms of estimation accuracy.
翻译:人们对估计看不见特征的数量的兴趣日益浓厚,主要是生物科学应用的驱动因素。最近的一项工作揭示出流行的稳定比目塔进程之前的反面和下面,以及其概括性。在巴伊西亚对不可见地物问题的非参数推论中,对不可见地物问题的反面作用在于:i)在事后分布中,取样信息的使用有限,这只能取决于通过抽样大小来观察的样本;ii)后向在于后向在于分析的可观察样品分布的分析可及可解释性,即简单的Poisson分布,其参数易于计算,取决于样本大小和先前参数的概括性。在本论文中,我们介绍并调查了一种替代的不可比较性,即以前称为稳定比目塔分布的过程,这是第一个能够通过将可观察样品的特征的数量的抽样信息纳入到可观察样品中,同时保持与先前的稳定比目线过程相同的分析可及可解释性。我们之前的精确性分析性和可与先前的稳定比值相比,我们之前的样本的精确性能导致先前的精确性参数的精确性,而我们以前观察到的样本的精确性数据在之前的精确性估算中则取决于我们所观察到的精确度的精确性数据,其先前的比值的精确性,其精确度的比值的比值的比值的比值,其先前的精确性,其先前的比值取决于了我们所观察到的精确性,其先前的精确性,其先前的精确性,其先前的精确性数据在所观察到的样品和直观的比值的精确性,其前的精确性,其前的比值的比值的比值的比值的比值的比值的比值的比值取决于了我们所观察到的比值的比值的比值,其前的比值,其前的比值的比值。