We introduce the Conditional Self-Attention Imputation (CSAI), a novel recurrent neural network architecture designed to address the challenges of complex missing data patterns in multivariate time series derived from hospital electronic health records (EHRs). CSAI extends the current state-of-the-art neural network-based imputation methods by introducing key modifications specifically adapted to EHR data characteristics, namely: a) an attention-based hidden state initialisation technique to capture both long- and short-range temporal dependencies prevalent in EHRs, b) a domain-informed temporal decay mechanism to adjust the imputation process to clinical data recording patterns, and c) a non-uniform masking strategy that models non-random missingness by calibrating weights according to both temporal and cross-sectional data characteristics. Comprehensive evaluation across four EHR benchmark datasets demonstrate CSAI's effectiveness compared to state-of-the-art neural architectures in data restoration and downstream predictive tasks. Additionally, CSAI is integrated within PyPOTS, an open-source Python toolbox designed for machine learning tasks on partially observed time series. This work significantly advances the state of neural network imputation applied to EHRs by more closely aligning algorithmic imputation with clinical realities.
翻译:暂无翻译