Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is considerable interest in modeling interactions among such relative proportions. To this end we propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex. Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions. Generally, the distributions we consider have a density that features a difficult to compute normalizing constant. To circumvent this issue, we design effective estimation methods based on generalized versions of score matching. A high-dimensional analysis of our estimation methods shows that the simplex domain is handled as efficiently as previously studied full-dimensional domains.
翻译:微生物数据分析等应用导致人们重新关注组成数据的统计方法,即以含有相对比例的概率矢量为形式的多变量数据。特别是,人们相当有兴趣模拟这种相对比例之间的相互作用。为此,我们建议了一组指数式家庭模型,该模型既顾及双向互动的一般模式,同时又在概率简单x上得到支持。特殊案例包括Drichlet的分布家庭以及Aitchison的添加性物流正常分布。一般而言,我们认为分布的密度具有难以计算正常常数的密度。为绕过这一问题,我们设计了基于平分比对通用版本的有效估算方法。我们对估算方法的高度分析表明,对简单x域的处理效率与以前研究过的全维域一样。