Electronic health records (EHRs) have become the foundation of machine learning applications in healthcare, while the utility of real patient records is often limited by privacy and security concerns. Synthetic EHR generation provides an additional perspective to compensate for this limitation. Most existing methods synthesize new records based on real EHR data, without consideration of different types of events in EHR data, which cannot control the event combinations in line with medical common sense. In this paper, we propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR synthesis to address these limitations. First, we formulate the synthetic EHR generation process as a probabilistic graphical model and tightly connect different types of events by modeling the latent health states. Then, we derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records. Furthermore, we propose to generate medical reports to add textual descriptions for each medical event, providing broader applications for synthesized EHR data. For generating different paragraphs in each visit, we incorporate a multi-generator deliberation framework to collaborate the message passing of multiple generators and employ a two-phase decoding strategy to generate high-quality reports. Our extensive experiments on the widely used benchmarks, MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results on the quality of synthetic data while maintaining low privacy risks.
翻译:暂无翻译