Missing time-series data is a prevalent practical problem. Imputation methods in time-series data often are applied to the full panel data with the purpose of training a model for a downstream out-of-sample task. For example, in finance, imputation of missing returns may be applied prior to training a portfolio optimization model. Unfortunately, this practice may result in a look-ahead-bias in the future performance on the downstream task. There is an inherent trade-off between the look-ahead-bias of using the full data set for imputation and the larger variance in the imputation from using only the training data. By connecting layers of information revealed in time, we propose a Bayesian posterior consensus distribution which optimally controls the variance and look-ahead-bias trade-off in the imputation. We demonstrate the benefit of our methodology both in synthetic and real financial data.
翻译:缺失的时间序列数据是一个普遍存在的实际问题。时间序列数据中的插补方法通常在整个面板数据上应用,旨在为下游的样本外任务训练模型。例如,在金融领域中,对于训练组合优化模型,应先进行缺失收益的插补。不幸的是,这种做法可能会导致未来任务绩效的前瞻偏差。在使用完整数据集进行插补时,存在前瞻偏差和使用仅训练数据进行插补时的更大方差之间的固有折衷。通过连接时间中显现的信息层,我们提出一种贝叶斯后验一致性分布,该分布在插补中优化控制方差和前瞻偏差的折衷。我们在合成和真实金融数据中展示了我们方法的益处。