Health insurance claims data offer a unique opportunity to study disease distribution on a large scale. Challenges arise in the process of accurately analyzing these raw data. One important challenge to overcome is the accurate classification of study outcomes. For example, using claims data, there is no clear way of classifying hospitalizations due to a specific event. This is because of the inherent disjointedness and lack of context that typically come with raw claims data. In this paper, we propose a framework for classifying hospitalizations due to a specific event. We then test this framework in a health insurance claims database with approximately 4 million US adults who tested positive with COVID-19 between March and December 2020. Our claims specific COVID-19 related hospitalizations proportion is then compared to nationally reported rates from the Centers for Disease Control by age and sex.
翻译:健康保险索赔数据为大规模研究疾病分布提供了独特的机会。在准确分析这些原始数据的过程中出现了挑战。一个需要克服的重要挑战就是对研究结果进行准确分类。例如,使用索赔数据,由于某一具体事件,没有明确的方法对住院进行分类。这是因为典型的原始索赔数据具有内在的脱节和缺乏背景。在本文中,我们提出了一个因某一具体事件而对住院进行分类的框架。然后,我们在健康保险索赔数据库中测试这一框架,在2020年3月至12月期间对大约400万接受COVID-19检测呈阳性的成人进行了测试。我们索赔的具体COVID-19相关的住院比例随后与疾病控制中心按年龄和性别分列的全国报告的比例进行了比较。