Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models, VAEs and normalising flows, from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.
翻译:统计模型对于在一系列下游任务中广泛应用的机器学习至关重要。这些模型通常由自由参数控制,这些参数是从数据中根据最大可能性估计的估计数估算出来的。然而,当面对现实世界数据集时,许多模型遇到一个关键问题:它们是根据完全观察的数据制定的,而实际上,数据集被缺失的数据所困扰。从不完整数据中得出的统计模型估计理论在概念上类似于对潜伏可变模型的估计,在这种模型中存在着诸如变异推论(VI)等强有力的工具。然而,与标准的潜伏可变模型相比,使用不完整数据的参数估计往往需要估计缺失变量的指数-多条件分布,从而使标准六方法变得难用。我们通过采用变异Gibs推论(VGI)来弥补这一差距,这是一种从不完整数据中估算统计模型参数的新的通用方法。我们验证了一套合成和现实世界估算任务、估计重要机器学习模型、VAEs和正常化流动,来自不完整的数据。拟议方法既具有通用目的,又具有竞争性,也比现有具体模型估计方法更佳。