The Pitman-Yor process is a random probability distribution, that can be used as a prior distribution in a nonparametric Bayesian analysis. The process is of species sampling type and generates discrete distributions, which yield of the order $n^\sigma$ different values ("species") in a random sample of size $n$, if the type $\sigma$ is positive. Thus this type parameter can be set to target true distributions of various levels of discreteness, making the Pitman-Yor process an interesting prior in this case. It was previously shown that the resulting posterior distribution is consistent if and only if the true distribution of the data is discrete. In this paper we derive the distributional limit of the posterior distribution, in the form of a (corrected) Bernstein-von Mises theorem, which previously was known only in the continuous, inconsistent case. It turns out that the Pitman-Yor posterior distribution has good behaviour if the true distribution of the data is discrete with atoms that decrease not too slowly. Credible sets derived from the posterior distribution provide valid frequentist confidence sets in this case. For a general discrete distribution, the posterior distribution, although consistent, may contain a bias which does not converge to zero at the $\sqrt{n}$ rate and invalidates posterior inference. We propose a bias correction that solves this problem. We also consider the effect of estimating the type parameter from the data, both by empirical Bayes and full Bayes methods. In a small simulation study we illustrate that without bias correction the coverage of credible sets can be arbitrarily low, also for some discrete distributions.
翻译:Pitman- Yor 进程是一个随机的离散概率分布, 可以在非参数贝叶斯分析中用作先前在非参数性 Bayesian 分析中的一种分流。 这一过程是物种抽样类型, 并产生离散分布, 产生以美元为单位的随机抽样, 以美元为单位, 如果美元为单位, 则其类型为 $\\ gifan- Yor 进程是一个随机的离散概率分布 。 因此, 此类型参数可以设定为不同程度的分解真实分布目标, 使 Pitman- Yor 进程在本案之前成为一种有趣的规则 。 先前曾显示, 产生的后端分布是否一致, 只有当数据的真实分布类型显示为离散时, 才会产生离散分布的分解范围, 而在本文的离异性分布序列中, 直线的分布序列可以包含一个连续的、 不一致的案例。 Wetman- Yor or 后端分布为好的行为 。 如果数据的真实分布与原子分解, 并不缓慢地减少数据分布范围。 。 。 。 直观的离析序列的分布分析中, 直径的数据集可以得出出一个连续的分解到直径分布式的分布结果, 。