Improving healthcare quality and access remains a critical concern for countries worldwide. Consequently, the rise of large language models (LLMs) has erupted a wealth of discussion around healthcare applications among researchers and consumers alike. While the ability of these models to pass medical exams has been used to argue in favour of their use in medical training and diagnosis, the impact of their inevitable use as a self-diagnostic tool and their role in spreading healthcare misinformation has not been evaluated. In this work, we critically evaluate LLMs' capabilities from the lens of a general user self-diagnosing, as well as the means through which LLMs may aid in the spread of medical misinformation. To accomplish this, we develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases. In doing so, we reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations, which increases the risk of spreading medical misinformation.
翻译:暂无翻译