Our goal is to develop a general strategy to decompose a random variable $X$ into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, $X$ can be "thinned" into independent random variables $X^{(1)}, \ldots, X^{(K)}$, such that $X = \sum_{k=1}^K X^{(k)}$. In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct $X$. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.
翻译:我们的目标是开发一种通用策略来将随机变量$X$分解为多个独立的随机变量,而不会牺牲任何关于未知参数的信息。最近的一篇论文表明,对于一些知名的自然指数族,$X$可以被“稀疏化”为独立的随机变量$X^{(1)},\ldots,X^{(K)}$,使得$X=\sum_{k=1}^K X^{(k)}$。在本文中,我们通过放宽这个求和要求,只要求一些已知独立随机变量的函数可以完全重构$X$,来推广他们的过程。这个过程的推广具有两个目的。首先,它大大扩展了可以进行稀疏化的分布族。其次,它将抽样分割和数据稀疏化统一起来,这两者在表面上似乎是非常不同的,但作为同一原则的应用。这个共享的原则是充分性。我们利用这个洞察力,为各种不同的家族执行广义稀疏化操作。