Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are controlled by free parameters that are typically estimated from data by maximum-likelihood estimation or approximations thereof. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as VAEs and normalising flows from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.
翻译:统计模型对于在一系列下游任务中广泛应用的机器学习具有核心意义。模型由自由参数控制,这些参数通常根据数据以最大可能性估计或近似值来估算。然而,当面对现实世界数据集时,许多模型都面临一个关键问题:它们是根据完全观察的数据制定的,而实际上,数据集被缺失的数据所困扰。从不完整数据中统计模型估计理论在概念上类似于对潜在可变模型的估计,因为存在诸如变式推断(VI)等强大工具。然而,与标准的潜伏可变数模型(VI)相比,与不完整数据相比,参数估计往往需要估计缺失变量的指数-多条件分布,从而使标准六方法变得难以使用。我们通过采用变异的Gib Inference(VGI)来弥补这一差距,这是一种从不完整数据中估算统计模型参数的新的通用方法。我们根据一套合成和真实的估算任务验证VGII,估计重要的机器学习模型,如VAE和从不完整数据流中正常流出。拟议的方法虽然是通用的,但有竞争力或更好的方法。