Statistical models often include thousands of parameters. However, large models decrease the investigator's ability to interpret and communicate the estimated parameters. Reducing the dimensionality of the parameter space in the estimation phase is a commonly used approach, but less work has focused on selecting subsets of the parameters for interpreting the estimated model -- especially in settings such as Bayesian inference and model averaging. Importantly, many models do not have straightforward interpretations and create another layer of obfuscation. To solve this gap, we introduce a new method that uses the Wasserstein distance to identify a low-dimensional interpretable model projection. After the estimation of complex models, users can budget how many parameters they wish to interpret and the proposed generates a simplified model of the desired dimension minimizing the distance to the full model. We provide simulation results to illustrate the method and apply it to cancer datasets.
翻译:统计模型往往包括数千个参数。但是,大型模型降低了调查员解释和交流估计参数的能力。在估计阶段降低参数空间的维度是一个常用的方法,但较少的工作侧重于选择用于解释估计模型的参数子集 -- -- 特别是在贝叶斯推论和平均模型等环境中。重要的是,许多模型没有直截了当的解释,造成另一层混乱。为了解决这一差距,我们采用了新方法,利用瓦塞斯坦距离来确定低维度可解释模型预测。在估计复杂模型之后,用户可以预算他们希望解释的参数数量,而拟议的参数可以产生一个简化的模型,将理想的参数与完整模型的距离最小化。我们提供模拟结果,说明该方法,并将其应用于癌症数据集。