转换缩放的过程先验在贝叶斯非参数分析中的特征分配 (Transform-scaled process priors for trait allocations in Bayesian nonparametrics)

Completely random measures (CRMs) provide a broad class of priors, arguably, the most popular, for Bayesian nonparametric (BNP) analysis of trait allocations. As a peculiar property, CRM priors lead to predictive distributions that share the following common structure: for fixed prior's parameters, a new data point exhibits a Poisson (random) number of ``new'' traits, i.e., not appearing in the sample, which depends on the sampling information only through the sample size. While the Poisson posterior distribution is appealing for analytical tractability and ease of interpretation, its independence from the sampling information is a critical drawback, as it makes the posterior distribution of ``new'' traits completely determined by the estimation of the unknown prior's parameters. In this paper, we introduce the class of transform-scaled process (T-SP) priors as a tool to enrich the posterior distribution of ``new'' traits arising from CRM priors, while maintaining the same analytical tractability and ease of interpretation. In particular, we present a framework for posterior analysis of trait allocations under T-SP priors, showing that Stable T-SP priors, i.e., T-SP priors built from Stable CRMs, lead to predictive distributions such that, for fixed prior's parameters, a new data point displays a negative-Binomial (random) number of ``new'' traits, which depends on the sampling information through the number of distinct traits and the sample size. Then, by relying on a hierarchical version of T-SP priors, we extend our analysis to the more general setting of trait allocations with multiple groups of data or subpopulations. The empirical effectiveness of our methods is demonstrated through numerical experiments and applications to real data.

翻译：完全随机测度(CRMs)为贝叶斯非参数(BNP)特征分配分析提供了一类广泛的先验，这可能是最流行的先验。作为一种特殊的属性，CRMs先验导致共享以下普遍结构的预测分布：对于固定的先验参数，一个新的数据点显示了一个泊松(随机)数量的“新”特征，即未出现在样本中的特征，这个数量仅通过样本大小依赖于采样信息。尽管泊松后验分布具有分析可处理性和易于解释性的优点，但其独立于采样信息的这一特定属性是一个关键的缺点，因为它使得“新”特征的后验分布完全由估计未知的先验参数所决定。在本文中，我们引入了转换缩放过程(T-SP)先验类作为一种工具，为CRM先验产生的“新”特征的后验分布注入更多的信息，同时保持相同的分析可处理性和易于解释性。特别地，我们提出了一种在T-SP先验下的特征分配的后验分析框架，显示了稳定的T-SP先验，即从稳定的CRMs构建的T-SP先验，导致预测分布，对于固定的先验参数，一个新的数据点显示了负二项(随机)数量的“新”特征，这个数量通过样本中的不同特征的数量和样本大小从采样信息产生影响。然后，通过依赖层次版本的T-SP先验，我们将分析扩展到具有多个数据组或子群体的特征分配的更一般的设置。我们通过数值实验和对真实数据的应用来证明我们方法的实证效果。