The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
翻译:英国卫生安全局通过2021年3月至2022年3月在英格兰进行的全国测试和追踪方案和REACT-1调查,在阿尔法和德尔塔SARS-COV-2变异体和一些奥微型变体子线的主要传输过程中,通过2021年3月至2022年3月在英格兰进行的REACT-1调查,征聘自愿参与者;在“SARS-COV-2感染状况或相关呼吸道症状分类”的机器学习模型中收集了自愿咳嗽、呼气和语音的录音记录,帮助在人口、自报症状和呼吸道状况数据中击败corona病毒的数码调查,并与SARS-COV-2 PCR-2参考录音记录的结果挂钩;PCR结果与72 999名参与者中的70 794个和25 776个阳性病例中的24 155个相联;45.62%的参与者报告了呼吸道症状;这一数据集在生物系统测试结果研究中使用了更多的潜在用途,与11.30 %的参与者报告了气压和呼吸道检查结果。