Large-language models (LLMs) have recently shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 13 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our evaluation, including both LLM and expert human assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and GPT-4 + ChemCrow performance. There is a significant risk of misuse of tools like ChemCrow and we discuss their potential harms. Employed responsibly, ChemCrow not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
翻译:大型语言模型(LLM)最近在各行业的任务中表现出了强大的性能,但在处理化学相关问题时仍存在困难。此外,这些模型缺乏对外部知识来源的访问,限制了它们在科学应用中的实用性。在本研究中,我们介绍了ChemCrow,一种LLM化学代理,旨在完成有机合成、药物开发和材料设计等任务。通过整合13种专家设计的工具,ChemCrow增强了LLM化学性能,产生了新的功能。我们的评估包括LLM和专家人类评估,证明了ChemCrow在自动执行各种化学任务方面的有效性。令人惊讶的是,我们发现作为评估者的GPT-4无法区分明显错误的GPT-4完成和GPT-4 + ChemCrow性能之间的差异。这些工具的潜在危害非常大,我们讨论它们可能带来的风险。在负责任的使用下,ChemCrow不仅有助于专业化学家并降低非专业人士的门槛,而且通过弥合实验和计算化学之间的差距促进科学进步。