We consider the estimation of densities in multiple subpopulations, where the available sample size in each subpopulation greatly varies. For example, in epidemiology, different diseases may share similar pathogenic mechanism but differ in their prevalence. Without specifying a parametric form, our proposed approach pools information from the population and estimate the density in each subpopulation in a data-driven fashion. Low-dimensional approximating density families in the form of exponential families are constructed from the principal modes of variation in the log-densities, within which subpopulation densities are then fitted based on likelihood principles and shrinkage. The approximating families increase in their flexibility as the number of components increases and can approximate arbitrary infinite-dimensional densities with discrete observations, for which we derived convergence results. The proposed methods are shown to be interpretable and efficient in simulation as well as applications to electronic medical record and rainfall data.
翻译:我们考虑对多个亚人口群密度的估计,其中每个亚人口群的现有抽样规模差异很大,例如,在流行病学中,不同疾病可能具有相似的致病机制,但其流行程度不同。在不说明参数形式的情况下,我们拟议的方法将人口信息汇总起来,并以数据驱动的方式估计每个亚人口群的密度。以指数式家庭形式形成的低维相近密度家庭,是根据日志密度的主要变化模式构建的,在这种模式中,子人口群密度根据概率原理和缩缩缩度加以调整。随着组成部分数量的增加,相近家庭的灵活性随着其组成部分数量的增加而增加,并有可能以离散观测为近似任意的无限密度,我们由此得出趋同结果。提议的方法在模拟中以及电子医疗记录和降雨数据的应用中可以解释并有效。