Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2022) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
翻译:假设我们观察的是在已知且参数不明的家族中,从某些分配流体流出的美元美元中随机的矢量 $X美元。 我们问了以下问题: 何时有可能将X美元分成两个部分 美元(x) 美元和美元(X) 美元,这样两个部分都不足以自行重建X美元,但两者都能够全部回收X美元,而美元(f(X),g(X) 美元) 的合并分配是可移动的? 例如, 如果美元=(X) (X) 1,\ dots,X_n) 美元和美元(P) 是产品分配的, 那么对于任何美元(美元) 美元(x) 美元(x) 美元) 美元和 美元(g(X) 美元) 美元(X) 美元(X) 美元(X) 美元(X) 美元(X) 美元(X) 美元(X) 美元(X) 美元(X), 美元(X) 美元(X) 美元(美元) 美元(美元(美元) 美元(美元) (美元) 美元(美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元(美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元(美元) (美元) (美元) (美元) (美元(美元) (美元) (美元) (美元) (美元) (美元(美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元(美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (美元) (