Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce a concept of structured high-dimensional probability simplexes, whose most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by 1) high-dimensional weights that are common in modern applications, and 2) ubiquitous examples in which equal weights -- despite their simplicity -- often achieve favorable or even state-of-the-art predictive performances. This particular structure, however, presents unique challenges both computationally and statistically. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured high-dimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for easy implementation. Posterior contraction rates are established to provide theoretical support. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters dataset and a UCI dataset.
翻译:向大量天体分配权重是多种应用中的一项基本任务。 在本条中,我们引入了结构化高维概率简单化概念,其大多数组成部分为零或接近零,其余部分彼此接近。这种结构的动机是:(1) 现代应用中常见的高维重量,和(2) 各种无处不在的例子,其中等量 -- -- 尽管它们简单 -- -- 往往能够取得有利的甚至最先进的预测性能。然而,这一特殊结构在计算和统计两方面都提出了独特的挑战。为了应对这些挑战,我们提议了一个新的双倍加注Drichlet类别,在将概率简单化的概率缩到与预期结构相近的一级之前,我们建议采用一个新的类别。在应用这种结构的高度重量化加权式学习时,这种前期导致一种贝伊斯法,用于预测组合和改进随机森林,同时能够对不确定性进行量化。我们设计高效的Markov连锁蒙特卡洛算法,以便于实施。我们建立了离子收缩率,以提供理论支持。我们通过模拟和两个数据银行,我们展示了拟议的中央评估方法的专业性和竞争性表现。