When a missing process depends on the missing values themselves, it needs to be explicitly modelled and taken into account while doing likelihood-based inference. We present an approach for building and fitting deep latent variable models (DLVMs) in cases where the missing process is dependent on the missing data. Specifically, a deep neural network enables us to flexibly model the conditional distribution of the missingness pattern given the data. This allows for incorporating prior information about the type of missingness (e.g. self-censoring) into the model. Our inference technique, based on importance-weighted variational inference, involves maximising a lower bound of the joint likelihood. Stochastic gradients of the bound are obtained by using the reparameterisation trick both in latent space and data space. We show on various kinds of data sets and missingness patterns that explicitly modelling the missing process can be invaluable.
翻译:当缺失过程取决于缺失值本身时,它需要明确模拟,并在进行基于概率的推断时加以考虑。我们提出了一个在缺失过程取决于缺失数据的情况下建立和安装深潜变量模型的方法。具体地说,深神经网络使我们能够灵活地模拟根据数据对缺失模式进行有条件分布的情况。这使我们能够在模型中纳入关于缺失类型(如自我检查)的先前信息。我们基于加权的变异推论的推论技术涉及最大程度降低共同可能性的界限。通过在潜在空间和数据空间使用重新校准的伎俩,可以获得该界限的悬浮梯度。我们展示了各种数据集和缺失模式,这些数据集和缺失模式明确模拟了缺失过程的价值。