Missing data scenarios are very common in ML applications in general and time-series/sequence applications are no exceptions. This paper pertains to a novel Recurrent Neural Network (RNN) based solution for sequence prediction under missing data. Our method is distinct from all existing approaches. It tries to encode the missingness patterns in the data directly without trying to impute data either before or during model building. Our encoding is lossless and achieves compression. It can be employed for both sequence classification and forecasting. We focus on forecasting here in a general context of multi-step prediction in presence of possible exogenous inputs. In particular, we propose novel variants of Encoder-Decoder (Seq2Seq) RNNs for this. The encoder here adopts the above mentioned pattern encoding, while at the decoder which has a different structure, multiple variants are feasible. We demonstrate the utility of our proposed architecture via multiple experiments on both single and multiple sequence (real) data-sets. We consider both scenarios where (i)data is naturally missing and (ii)data is synthetically masked.
翻译:缺失的数据假设在一般的 ML 应用中非常常见,时间序列/序列应用也不例外。 本文涉及在缺失数据下对序列预测的新的常有神经网络( RNN) 常有的常有的常有的常有的神经网络( RNN) 解决方案。 我们的方法与所有现有方法不同。 它试图直接在数据中编码缺失模式, 但不试图在模型构建之前或期间对数据进行估算。 我们的编码是无损的, 并实现压缩。 它可用于序列分类和预测。 我们在此侧重于在可能外源投入的情况下, 在多步骤预测的大背景下进行预测。 我们特别为此提出了新的 Encoder- Decoder (Seq2Seq) RNN( Seq2Seq) 的变异种。 这里的编码采用上述模式编码, 在结构不同的解码器上, 多种变种是可行的。 我们通过对单个和多个序列( 真实的) 数据集进行多次实验来展示我们提议的架构的效用。 我们考虑了(i) 数据自然缺失和(ii) 数据是合成遮蔽的两种假设。