Graphical models are an important tool in exploring relationships between variables in complex, multivariate data. Methods for learning such graphical models are well developed in the case where all variables are either continuous or discrete, including in high-dimensions. However, in many applications data span variables of different types (e.g. continuous, count, binary, ordinal, etc.), whose principled joint analysis is nontrivial. Latent Gaussian copula models, in which all variables are modeled as transformations of underlying jointly Gaussian variables, represent a useful approach. Recent advances have shown how the binary-continuous case can be tackled, but the general mixed variable type regime remains challenging. In this work, we make the simple yet useful observation that classical ideas concerning polychoric and polyserial correlations can be leveraged in a latent Gaussian copula framework. Building on this observation we propose flexible and scalable methodology for data with variables of entirely general mixed type. We study the key properties of the approaches theoretically and empirically, via extensive simulations as well an illustrative application to data from the UK Biobank concerning COVID-19 risk factors.
翻译:图形模型是探讨复杂、多变量数据变量之间关系的重要工具。在所有变量都是连续的或离散的,包括高二元的,在这种情况下,学习这些图形模型的方法是十分完善的。然而,在许多应用中,数据跨越了不同类型的变量(如连续的、计数的、二进制的、交点的等),这些变量的原则联合分析是非三重的。Lient Gaussian Copula 模型,所有变量都建模为共同的Gaussian变量的转换,这是一种有用的方法。最近的进展表明,二进制案例是如何解决的,但一般的混合变量类型仍具有挑战性。在这项工作中,我们简单而有用的观察是,关于多组和多层关联的经典观点可以在潜伏的高斯大对交点框架中加以利用。我们根据这一观察,提出了具有完全普通混合变量的数据的灵活和可扩缩的方法。我们通过广泛的模拟,通过对英国生物银行关于COVI-19风险因素的数据进行说明性应用,从理论上和经验上研究这些方法的关键特性。