Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
翻译:摘要:近年来,大型语言模型在不同领域的任务上表现出了强大的性能,但却在化学相关的问题上表现不佳。此外,这些模型缺乏对外部知识来源的访问,限制了它们在科学应用中的使用价值。在本研究中,我们介绍了ChemCrow,一种基于大语言模型的化学智能工具,旨在解决有机合成、药物研发和材料设计等化学任务。通过集成13种专业设计的工具,ChemCrow增强了大语言模型在化学方面的性能,并产生了新的功能。我们的评估,包括大语言模型和人类专家的评估,证明了ChemCrow在自动化各种化学任务方面的有效性。令人惊讶的是,我们发现GPT-4作为测试器无法区分明显错误的GPT-4输出和GPT-4 + ChemCrow难以分辨的表现之间的差异。这些工具的滥用存在着重大风险,我们讨论了它们的潜在危害。在负责任的使用下,ChemCrow不仅有助于专业化学家并降低非专业人士的门槛,而且通过桥接实验和计算化学之间的差距促进了科学的发展。