In 2020, the COVID-19 pandemic resulted in a rapid response from governments and researchers worldwide. As of May 2022, over 6 million people died as a result of COVID-19 and over 500 million confirmed cases, with many COVID-19 survivors going on to experience long-term effects weeks, months, or years after their illness. Despite this staggering toll, those who work with pandemic-relevant data often face significant systemic barriers to accessing, sharing or re-using this data. In this paper we report results of a study, where we interviewed data professionals working with COVID-19-relevant data types including social media, mobility, viral genome, testing, infection, hospital admission, and deaths. These data types are variously used for pandemic spread modelling, healthcare system strain awareness, and devising therapeutic treatments for COVID-19. Barriers to data access, sharing and re-use include the cost of access to data (primarily certain healthcare sources and mobility data from mobile phone carriers), human throughput bottlenecks, unclear pathways to request access to data, unnecessarily strict access controls and data re-use policies, unclear data provenance, inability to link separate data sources that could collectively create a more complete picture, poor adherence to metadata standards, and a lack of computer-suitable data formats.
翻译:截至2022年5月,共有600多万人因COVID-19和5亿多确认病例而死亡,其中许多COVID-19幸存者在疾病后几周、几个月或几年继续经历长期影响。尽管伤亡惊人,使用与大流行病有关的数据的人在获取、分享或重新使用这些数据方面往往面临巨大的系统性障碍。在本文中,我们报告了一项研究的结果,我们采访了从事COVID-19相关数据类型的数据专业人员,包括社会媒体、流动性、病毒基因组、测试、感染、住院和死亡。这些数据类型多种多样地用于大流行病传播建模、保健系统紧张意识和为COVID-19设计治疗治疗方法。数据获取、共享和再使用的障碍包括获取数据的成本(主要是某些保健来源和移动电话载体的流动数据 ) 、 人脉瓶颈、 要求获取数据的途径不明确、 不必要的严格获取控制和数据再使用政策、 数据证明数据不明确、无法将不同的数据源连接到一个共同创建更完整的数据格式的完整数据格式。