There has been rapidly growing interests in Automatic Diagnosis (AD) and Automatic Symptom Detection (ASD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence relevant to their concerns, and make predictions about the underlying diseases. Doctors would review the interaction, including the evidence and the predictions, before making their final decisions. Despite the recent progress, an important piece of doctors' interactions with patients is missing in the design of AD and ASD systems, namely the differential diagnosis. Its absence is largely due to the lack of datasets that include such information for models to train on. In this work, we present a large-scale synthetic dataset that includes a differential diagnosis, along with the ground truth pathology, for each patient. In addition, this dataset includes more pathologies, as well as types of symtoms and antecedents. As a proof-of-concept, we extend several existing AD and ASD systems to incorporate differential diagnosis, and provide empirical evidence that using differentials in training signals is essential for such systems to learn to predict differentials. Dataset available at https://github.com/bruzwen/ddxplus
翻译:自动诊断(AD)和自动症状检测(ASD)系统在机器学习研究文献中的兴趣迅速增长,目的是协助医生进行远程医疗服务,这些系统旨在与病人互动,收集与其关切相关的证据,并对潜在的疾病作出预测;医生在作出最后决定之前将审查这种互动,包括证据和预测;尽管最近取得了进展,在设计AD和ASD系统时,医生与病人的交往中缺少重要的一环,即差别诊断;缺乏这些系统的主要原因是缺乏数据集,其中包括用于培训模型的这类信息;在这项工作中,我们提出一个大型合成数据集,其中包括对每个病人的差别诊断,以及地面真相病理学;此外,该数据集包括更多的病理学,以及同系和前科的种类;作为证据,我们扩展了现有的若干AD和ASD系统,以纳入差异诊断;并提供经验证据,证明在培训信号中使用差异对于这种系统必须使用差异,从而能够预测数据。