The behavior of many Bayesian models used in machine learning critically depends on the choice of prior distributions, controlled by some hyperparameters that are typically selected by Bayesian optimization or cross-validation. This requires repeated, costly, posterior inference. We provide an alternative for selecting good priors without carrying out posterior inference, building on the prior predictive distribution that marginalizes out the model parameters. We estimate virtual statistics for data generated by the prior predictive distribution and then optimize over the hyperparameters to learn ones for which these virtual statistics match target values provided by the user or estimated from (subset of) the observed data. We apply the principle for probabilistic matrix factorization, for which good solutions for prior selection have been missing. We show that for Poisson factorization models we can analytically determine the hyperparameters, including the number of factors, that best replicate the target statistics, and we study empirically the sensitivity of the approach for model mismatch. We also present a model-independent procedure that determines the hyperparameters for general models by stochastic optimization, and demonstrate this extension in context of hierarchical matrix factorization models.
翻译:在机器学习中使用的许多贝叶斯模型的行为,关键地取决于先前分布的选择,这些模型通常由巴伊西亚优化或交叉校验所选定的某些超参数所控制。这需要反复、昂贵和事后推论。我们提供了一种选择好前科的替代办法,而无需进行后推推推,而以先前预测的分布为基础,使模型参数边缘化。我们估计了先前预测分布产生的数据的虚拟统计数据,然后优化了超参数,以了解这些虚拟统计数据与用户提供的目标值或(子集)观测到的数据估计值相匹配的那些数据。我们采用了概率矩阵因子因子化原则,而先前选择时缺少了良好的解决办法。我们表明,对于普瓦森系数化模型,我们可以分析确定超参数,包括最能复制目标统计的因素数量,我们从经验角度研究模型不匹配方法的敏感性。我们还介绍了一个依赖模型的程序,该程序通过随机校准的优化确定通用模型的超参数,并在等级矩阵化模型模型模型化模型化模型中展示了这一扩展范围。