In rural regions of several developing countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable. Many of these regions are gradually gaining access to internet infrastructure, although not with a strong enough connection to allow for sustained communication with a medical practitioner. Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the unavailability of information in indigenous languages can be easily prevented. In this paper, we describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant. Our contribution includes defining the NLP pipeline required for named-entity-recognition, language-agnostic sentence embedding, natural language translation, information retrieval, question answering, and generative pre-training for final query processing. We obtain promising results for this pipeline and preliminary results for EHR (Electronic Health Record) analysis with text summarization for medical practitioners to peruse for their diagnosis. Through this NLP pipeline, we aim to provide preliminary medical information to the user and do not claim to supplant diagnosis from qualified medical practitioners. Using the input from subject matter experts, we have compiled a large corpus to pre-train and fine-tune our BioBERT based NLP model for the specific tasks. We expect recent advances in NLP architectures, several of which are efficient and privacy-preserving models, to further the impact of our solution and improve on individual task performance.
翻译:在一些发展中国家的农村地区,基本无法获得高质量的医疗保健、医疗基础设施和专业诊断,其中许多地区正在逐步获得互联网基础设施,尽管连接不够强,无法与开业医生持续沟通。由于缺乏医疗准入、病人以往的健康记录缺失以及无法以土著语言提供信息,因此可以很容易地防止一些死亡。本文描述了一种利用机械学习和NLP(语言处理)技术的惊人进展的方法,设计一种低资源、多语种和初步接触第一点的医疗助理的模式。我们的贡献包括界定NLP管道对名称实体识别、语言认知性判决嵌入、自然语言翻译、信息检索、问题回答和最后问询处理的基因化前培训所需的影响。我们从这一管道取得有希望的结果,EHR(电子健康记录)分析的初步结果,为医疗从业人员提供文本总结,供其诊断。通过这一NLP管道,我们的目标是向用户提供初步医疗信息信息,而不是从我们的最新诊断标准化的大规模诊断流程,我们从一些精细的医学专家,到我们编订的精细的DNA前期医学流程。