Missing time-series data is a prevalent problem in many prescriptive analytics models in operations management, healthcare and finance. Imputation methods for time-series data are usually applied to the full panel data with the purpose of training a prescriptive model for a downstream out-of-sample task. For example, the imputation of missing asset returns may be applied before estimating an optimal portfolio allocation. However, this practice can result in a look-ahead-bias in the future performance of the downstream task, and there is an inherent trade-off between the look-ahead-bias of using the entire data set for imputation and the larger variance of using only the training portion of the data set for imputation. By connecting layers of information revealed in time, we propose a Bayesian consensus posterior that fuses an arbitrary number of posteriors to optimize the variance and look-ahead-bias trade-off in the imputation. We derive tractable two-step optimization procedures for finding the optimal consensus posterior, with Kullback-Leibler divergence and Wasserstein distance as the dissimilarity measure between posterior distributions. We demonstrate in simulations and in an empirical study the benefit of our imputation mechanism for portfolio allocation with missing returns.
翻译:缺失的时间序列数据是运营管理、医疗保健和金融等许多规范分析模型中的普遍问题。时间序列数据的插补方法通常应用于完整的面板数据,旨在为下游样本外任务的训练建立规范模型。例如,在估计最优组合分配之前,可以应用缺失的资产回报的插补。但是,这种做法可能导致未来任务绩效的向前偏差,并且在插补只使用数据集培训部分时与使用整个数据集进行插补之间存在天然的方差和向前偏差之间的权衡。通过连接时间中出现的信息层,我们提出了一种贝叶斯共识后验,它融合了任意数量的后验,以优化插补中的方差和向前偏差之间的权衡。我们推导出寻找最优共识后验的简易两步优化程序,其中,后验分布之间的差异度量为Kullback-Leibler距离和Wasserstein距离。我们在模拟和实证研究中证明了我们的插补机制对于具有缺失回报的组合分配的益处。