When modelling censored observations, a typical approach in current regression methods is to use a censored-Gaussian (i.e. Tobit) model to describe the conditional output distribution. In this paper, as in the case of missing data, we argue that exploiting correlations between multiple outputs can enable models to better address the bias introduced by censored data. To do so, we introduce a heteroscedastic multi-output Gaussian process model which combines the non-parametric flexibility of GPs with the ability to leverage information from correlated outputs under input-dependent noise conditions. To address the resulting inference intractability, we further devise a variational bound to the marginal log-likelihood suitable for stochastic optimization. We empirically evaluate our model against other generative models for censored data on both synthetic and real world tasks and further show how it can be generalized to deal with arbitrary likelihood functions. Results show how the added flexibility allows our model to better estimate the underlying non-censored (i.e. true) process under potentially complex censoring dynamics.
翻译:当模拟受审查的观察时,当前回归方法的一个典型办法是使用受审查的Gausian(即Tobit)模型来描述有条件的产出分布。在本文中,如在缺少数据的情况下,我们争辩说,利用多种产出之间的相互关系可使模型能够更好地解决受审查的数据带来的偏差。为了这样做,我们引入了一种超强的多输出模型,将GP的非参数灵活性与在依赖投入的噪音条件下利用相关产出的信息的能力结合起来。为了解决由此产生的可推断性,我们进一步设计一种与边际原木相似的可进行随机优化的变体。我们用实验性方法对照其他受审查的合成和真实世界任务数据模型来评估我们的模型,并进一步表明如何处理任意可能性功能的问题。结果显示,增加的灵活性如何使我们的模型得以在潜在的复杂审查动态下更好地估计潜在的非审查(即真实)进程。