Recent large language models (LLMs) in the general domain, such as ChatGPT, have shown remarkable success in following instructions and producing human-like responses. However, such language models have not been tailored to the medical domain, resulting in poor answer accuracy and inability to give plausible recommendations for medical diagnosis, medications, etc. To address this issue, we collected more than 700 diseases and their corresponding symptoms, required medical tests, and recommended medications, from which we generated 5K doctor-patient conversations. In addition, we obtained 200K real patient-doctor conversations from online Q\&A medical consultation sites. By fine-tuning LLMs using these 205k doctor-patient conversations, the resulting models emerge with great potential to understand patients' needs, provide informed advice, and offer valuable assistance in a variety of medical-related fields. The integration of these advanced language models into healthcare can revolutionize the way healthcare professionals and patients communicate, ultimately improving the overall efficiency and quality of patient care and outcomes. In addition, we made public all the source codes, datasets, and model weights to facilitate the further development of dialogue models in the medical field. The training data, codes, and weights of this project are available at: The training data, codes, and weights of this project are available at: https://github.com/Kent0n-Li/ChatDoctor.
翻译:近期,普通领域的大型语言模型(LLM),例如 ChatGPT,在遵循指令和产生类似人类回应方面取得了显著的成功。然而,这种语言模型并未针对医学领域进行调整,导致回答精度不佳,无法提供针对医学诊断、药物等的恰当建议。为了解决这个问题,我们收集了700多个疾病及其对应的症状,必需的医学检查以及推荐药物,并由此生成5K个医生-患者对话。此外,我们从在线问诊医疗咨询网站获取了200K个真实的患者医生对话。通过使用这205k个医生-患者对话微调LLMs,得到的模型具有极大的潜力,可以理解患者的需求,提供知情建议,在各种与医学相关的领域提供宝贵的帮助。将这些先进的语言模型集成到医疗保健中,将彻底改革医务专业人员和患者之间的沟通方式,最终提高病人护理和结果的整体效率和质量。此外,我们公开了所有源代码、数据集和模型权重,以促进在医学领域对话模型的进一步发展。该项目的训练数据、代码和权重可在以下链接获得:https://github.com/Kent0n-Li/ChatDoctor。