Importance: The prevalence of severe mental illnesses (SMIs) in the United States is approximately 3% of the whole population. The ability to conduct risk screening of SMIs at large scale could inform early prevention and treatment. Objective: A scalable machine learning based tool was developed to conduct population-level risk screening for SMIs, including schizophrenia, schizoaffective disorders, psychosis, and bipolar disorders,using 1) healthcare insurance claims and 2) electronic health records (EHRs). Design, setting and participants: Data from beneficiaries from a nationwide commercial healthcare insurer with 77.4 million members and data from patients from EHRs from eight academic hospitals based in the U.S. were used. First, the predictive models were constructed and tested using data in case-control cohorts from insurance claims or EHR data. Second, performance of the predictive models across data sources were analyzed. Third, as an illustrative application, the models were further trained to predict risks of SMIs among 18-year old young adults and individuals with substance associated conditions. Main outcomes and measures: Machine learning-based predictive models for SMIs in the general population were built based on insurance claims and EHR.
翻译:美国严重精神疾病(SMI)的流行程度约占全国总人口的3%,对SMI进行大规模风险筛查的能力可以为早期预防和治疗提供信息。目标:开发了一种可扩缩的机器学习工具,用于对SMI进行人口层面风险筛查,包括精神分裂症、精神分裂障碍、精神错乱和双极障碍,使用1个医疗保险索赔和2个电子健康记录。设计、设置和参与者:来自一个拥有7 740万会员的全国性商业医疗保险受益人的数据,以及来自美国8家学术医院EHR患者的数据。首先,利用保险索赔或EHR数据中的案件控制组的数据来构建和测试了预测模型。第二,分析了跨数据源的预测模型的性能。第三,作为说明性应用,对模型进行了进一步培训,以预测18岁老年人和有物质相关条件的个人的SMI风险。主要结果和措施是:基于机器学习的SMI公司一般人口基于保险索赔的预测模型和E类索赔。