不同基因数据非参数实证贝系估算 (Nonparametric Empirical Bayes Estimation on Heterogeneous Data)

The simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous data where heterogeneity is captured by a nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the "Nonparametric Empirical Bayes Structural Tweedie" (NEST) estimator, which efficiently estimates the unknown effect sizes and properly adjusts for heterogeneity via a generalized version of Tweedie's formula. For the normal means problem, NEST simultaneously handles the two main selection biases introduced by heterogeneity: one, the selection bias in the mean, which cannot be effectively corrected without also correcting for, two, selection bias in the variance. Our theoretical results show that NEST has strong asymptotic properties without requiring explicit assumptions about the prior. Extensions to other two-parameter members of the exponential family are discussed. Simulation studies show that NEST outperforms competing methods, with much efficiency gains in many settings. The proposed method is demonstrated on estimating the batting averages of baseball players and Sharpe ratios of mutual fund returns.

翻译：根据相应研究所收集的数据对许多参数同时进行估计是一个关键的研究问题,在高维环境中重新引起注意。许多实际情况都涉及不同数据,其中异质性被扰动参数捕捉到。有效收集各种样本中的信息,同时正确核算异质性,这是大规模估算问题的一大挑战。我们通过采用“非单数光学贝叶树结构网”估计仪(NEST)解决这个问题,该估计仪有效地估计了未知影响大小,并通过通用版本的 Tweedie 公式对异质性作了适当调整。对于正常手段问题,NEST同时处理异质性所引入的两种主要选择偏差:一、平均值中的选择偏差,在不纠正差异的情况下无法有效纠正,二、选择偏差。我们的理论结果表明,NEST在不需要对先前作出明确假设的情况下具有很强的防腐蚀性特性。正在讨论向其他两个指数家庭成员扩展。模拟研究显示,NEST超越了相互竞争的棒球率率,并展示了相互竞争的方法。