基于语音的统一声学表征用于筛查神经与呼吸系统疾病 (Unified Acoustic Representations for Screening Neurological and Respiratory Pathologies from Voice)

Voice-based health assessment offers unprecedented opportunities for scalable, non-invasive disease screening, yet existing approaches typically focus on single conditions and fail to leverage the rich, multi-faceted information embedded in speech. We present MARVEL (Multi-task Acoustic Representations for Voice-based Health Analysis), a privacy-conscious multitask learning framework that simultaneously detects nine distinct neurological, respiratory, and voice disorders using only derived acoustic features, eliminating the need for raw audio transmission. Our dual-branch architecture employs specialized encoders with task-specific heads sharing a common acoustic backbone, enabling effective cross-condition knowledge transfer. Evaluated on the large-scale Bridge2AI-Voice v2.0 dataset, MARVEL achieves an overall AUROC of 0.78, with exceptional performance on neurological disorders (AUROC = 0.89), particularly for Alzheimer's disease/mild cognitive impairment (AUROC = 0.97). Our framework consistently outperforms single-modal baselines by 5-19% and surpasses state-of-the-art self-supervised models on 7 of 9 tasks, while correlation analysis reveals that the learned representations exhibit meaningful similarities with established acoustic features, indicating that the model's internal representations are consistent with clinically recognized acoustic patterns. By demonstrating that a single unified model can effectively screen for diverse conditions, this work establishes a foundation for deployable voice-based diagnostics in resource-constrained and remote healthcare settings.

翻译：基于语音的健康评估为可扩展、非侵入性的疾病筛查提供了前所未有的机遇，然而现有方法通常专注于单一病症，未能充分利用语音中蕴含的丰富多维度信息。我们提出了MARVEL（基于语音健康分析的多任务声学表征框架），这是一个注重隐私的多任务学习框架，仅利用衍生的声学特征即可同时检测九种不同的神经、呼吸及嗓音疾病，无需传输原始音频数据。我们的双分支架构采用具有任务特定头部的专用编码器，共享一个公共的声学主干网络，从而实现了有效的跨病症知识迁移。在大规模Bridge2AI-Voice v2.0数据集上的评估表明，MARVEL实现了0.78的整体AUROC，在神经系统疾病（AUROC = 0.89）特别是阿尔茨海默病/轻度认知障碍（AUROC = 0.97）上表现优异。我们的框架持续超越单模态基线模型5-19%，并在9项任务中的7项上优于当前最先进的自监督模型。相关性分析进一步揭示，学习到的表征与既定的声学特征展现出有意义的相似性，表明模型内部表征与临床公认的声学模式具有一致性。通过证明单一统一模型能够有效筛查多种疾病，本研究为在资源受限及远程医疗场景中部署基于语音的诊断系统奠定了理论基础。