大型语言模型作为万能钥匙：使用GPT揭示材料科学的秘密 (Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT) - 专知论文

会员服务 ·

0

大型语言模型 · 语言模型 · 数据集 · 器件性能 · 钙钛矿太阳能电池 ·

2023 年 4 月 12 日

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

翻译：大型语言模型作为万能钥匙：使用GPT揭示材料科学的秘密

Tong Xie,Yuwei Wan,Wei Huang,Yufei Zhou,Yixuan Liu,Qingyuan Linghu,Shaozhou Wang,Chunyu Kit,Clara Grazian,Wenjie Zhang,Bram Hoex

The amount of data has growing significance in exploring cutting-edge materials and a number of datasets have been generated either by hand or automated approaches. However, the materials science field struggles to effectively utilize the abundance of data, especially in applied disciplines where materials are evaluated based on device performance rather than their properties. This article presents a new natural language processing (NLP) task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science. We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR (Findable, Accessible, Interoperable, Reusable) dataset with 91.8% F1-score and extended the dataset with data published since its release. The produced data is formatted and normalized, enabling its direct utilization as input in subsequent data analysis. This feature empowers materials scientists to develop models by selecting high-quality review articles within their domain. Additionally, we designed experiments to predict the electrical performance of solar cells and design materials or devices with targeted parameters using large language models (LLMs). Our results demonstrate comparable performance to traditional machine learning methods without feature selection, highlighting the potential of LLMs to acquire scientific knowledge and design new materials akin to materials scientists.

翻译：摘要：数据量在探索尖端材料方面越来越重要，许多数据集已经通过手工制作或自动化方法生成。然而，材料科学领域在有效利用丰富的数据方面存在困难，特别是在应用学科中，材料是基于器件性能而不是其特性进行评估。本文提出了一种新的自然语言处理（NLP）任务——结构信息推理（SII），以应对材料科学中器件层面信息提取的复杂性。我们通过在现有钙钛矿太阳能电池FAIR（Findable、Accessible、Interoperable、Reusable）数据集上调整GPT-3来完成此任务，取得了91.8％的F1分数，并通过发布自其发布以来的数据扩展了数据集。生成的数据格式化和规范化，使其可以直接用作后续数据分析的输入。此功能赋予材料科学家通过选择其领域内的高质量审核文章来开发模型的能力。此外，我们设计了实验来预测太阳能电池的电气性能，并使用大型语言模型（LLM）设计具有目标参数的材料或器件。我们的结果表明，在不进行特征选择的情况下，与传统机器学习方法相当的性能，突显了LLM获取科学知识和设计新材料的潜力。

0

相关内容

大型语言模型

大型语言模型

大模型最权威课程！MIT最新《生成式AI-大模型》课程，MIT斯坦福OpenAI-DeepMind众多专家讲授

大模型最权威课程！MIT最新《生成式AI-大模型》课程，MIT斯坦福OpenAI-DeepMind众多专家讲授

专知会员服务

121+阅读 · 2023年5月26日

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

251+阅读 · 2022年8月31日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

微软等ACL2022《知识增强自然语言处理》教程，阐述最新前沿技术，附185页ppt

微软等ACL2022《知识增强自然语言处理》教程，阐述最新前沿技术，附185页ppt

专知

1+阅读 · 2022年5月24日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

高性能多孔TiO2基介观晶体钠离子电池负极材料的研究

国家自然科学基金

0+阅读 · 2015年12月31日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

面向非对称超级电容器的石墨烯插层钴基双金属氢氧化物复合电极材料的设计及合成

国家自然科学基金

0+阅读 · 2013年12月31日

贵金属—钙钛矿催化剂材料中的金属扩散和催化活性

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

某些分形集上的调和分析

国家自然科学基金

0+阅读 · 2012年12月31日

植入金属基团调控MOF磁性和催化性能的计算与实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

储能用纳米碳材料的功能调控及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

超声-TIG电弧复合焊的电弧物理机制及金属熔化行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

金属卟啉修饰二氧化钛光催化剂的研制及光催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Arxiv

1+阅读 · 2023年5月29日

Do Large Language Models Know What They Don't Know?

Arxiv

0+阅读 · 2023年5月29日

Writing user personas with Large Language Models: Testing phase 6 of a Thematic Analysis of semi-structured interviews

Arxiv

0+阅读 · 2023年5月29日

Assess and Summarize: Improve Outage Understanding with Large Language Models

Arxiv

0+阅读 · 2023年5月29日

Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions

Arxiv

0+阅读 · 2023年5月28日

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming

Arxiv

0+阅读 · 2023年5月26日

MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

Arxiv

0+阅读 · 2023年5月26日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

VIP会员

文章信息

相关主题

大型语言模型

钙钛矿太阳能电池

相关VIP内容

大模型最权威课程！MIT最新《生成式AI-大模型》课程，MIT斯坦福OpenAI-DeepMind众多专家讲授

大模型最权威课程！MIT最新《生成式AI-大模型》课程，MIT斯坦福OpenAI-DeepMind众多专家讲授

专知会员服务

121+阅读 · 2023年5月26日

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

251+阅读 · 2022年8月31日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

微软等ACL2022《知识增强自然语言处理》教程，阐述最新前沿技术，附185页ppt

微软等ACL2022《知识增强自然语言处理》教程，阐述最新前沿技术，附185页ppt

专知

1+阅读 · 2022年5月24日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Arxiv

1+阅读 · 2023年5月29日

Do Large Language Models Know What They Don't Know?

Arxiv

0+阅读 · 2023年5月29日

Writing user personas with Large Language Models: Testing phase 6 of a Thematic Analysis of semi-structured interviews

Arxiv

0+阅读 · 2023年5月29日

Assess and Summarize: Improve Outage Understanding with Large Language Models

Arxiv

0+阅读 · 2023年5月29日

Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions

Arxiv

0+阅读 · 2023年5月28日

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming

Arxiv

0+阅读 · 2023年5月26日

MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting

Arxiv

0+阅读 · 2023年5月26日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

相关基金

高性能多孔TiO2基介观晶体钠离子电池负极材料的研究

国家自然科学基金

0+阅读 · 2015年12月31日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

面向非对称超级电容器的石墨烯插层钴基双金属氢氧化物复合电极材料的设计及合成

国家自然科学基金

0+阅读 · 2013年12月31日

贵金属—钙钛矿催化剂材料中的金属扩散和催化活性

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

某些分形集上的调和分析

国家自然科学基金

0+阅读 · 2012年12月31日

植入金属基团调控MOF磁性和催化性能的计算与实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

储能用纳米碳材料的功能调控及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

超声-TIG电弧复合焊的电弧物理机制及金属熔化行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

金属卟啉修饰二氧化钛光催化剂的研制及光催化性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员