In an emergency room (ER) setting, stroke triage or screening is a common challenge. A quick CT is usually done instead of MRI due to MRI's slow throughput and high cost. Clinical tests are commonly referred to during the process, but the misdiagnosis rate remains high. We propose a novel multimodal deep learning framework, DeepStroke, to achieve computer-aided stroke presence assessment by recognizing patterns of minor facial muscles incoordination and speech inability for patients with suspicion of stroke in an acute setting. Our proposed DeepStroke takes one-minute facial video data and audio data readily available during stroke triage for local facial paralysis detection and global speech disorder analysis. Transfer learning was adopted to reduce face-attribute biases and improve generalizability. We leverage a multi-modal lateral fusion to combine the low- and high-level features and provide mutual regularization for joint training. Novel adversarial training is introduced to obtain identity-free and stroke-discriminative features. Experiments on our video-audio dataset with actual ER patients show that DeepStroke outperforms state-of-the-art models and achieves better performance than both a triage team and ER doctors, attaining a 10.94% higher sensitivity and maintaining 7.37% higher accuracy than traditional stroke triage when specificity is aligned. Meanwhile, each assessment can be completed in less than six minutes, demonstrating the framework's great potential for clinical translation.
翻译:在急诊室(ER)设置中,中风三角或筛查是一项共同的挑战。由于磁共振缓慢的吞吐量和高成本,通常会进行快速CT而不是磁共振,因为磁共振缓慢的吞吐量和高成本。临床测试通常在此过程中被提及,但误诊率仍然很高。我们建议采用新的多式深层次学习框架DeepStroke,通过识别在急性环境中怀疑中风的病人的轻微面部肌肉肌肉不协调及说话能力模式,实现计算机辅助中风出场率评估。我们提议的深吸盘在中风期间使用一分钟的面部视频数据和音频数据,而不是磁共振动性数据,用于局部面部麻痹检测和全球言语失常分析。我们采用了转移学习,以减少面部偏差,提高一般性率。我们提出了一种多式的多式横向融合,将低和高层次特征结合起来,为联合培训提供相互规范。引入了新式的对抗性培训,以获得没有身份和中风相偏差的特征。我们的视频数据集成,与实际的ER级病人进行实验表明,De Stro Stroke-trade-tradeforformexmexformexform-lax 10-rmaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx