大型语言模型作为主钥匙：利用GPT解锁材料科学的秘密 (Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT) - 专知论文

会员服务 ·

0

大型语言模型 · 语言模型 · 钙钛矿太阳能电池 · 多数据集 · 设计 ·

2023 年 4 月 11 日

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

翻译：大型语言模型作为主钥匙：利用GPT解锁材料科学的秘密

Tong Xie,Yuwei Wan,Wei Huang,Yufei Zhou,Yixuan Liu,Qingyuan Linghu,Shaozhou Wang,Chunyu Kit,Clara Grazian,Wenjie Zhang,Bram Hoex

The amount of data has growing significance in exploring cutting-edge materials and a number of datasets have been generated either by hand or automated approaches. However, the materials science field struggles to effectively utilize the abundance of data, especially in applied disciplines where materials are evaluated based on device performance rather than their properties. This article presents a new natural language processing (NLP) task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science. We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR (Findable, Accessible, Interoperable, Reusable) dataset with 91.8% F1-score and extended the dataset with data published since its release. The produced data is formatted and normalized, enabling its direct utilization as input in subsequent data analysis. This feature empowers materials scientists to develop models by selecting high-quality review articles within their domain. Additionally, we designed experiments to predict the electrical performance of solar cells and design materials or devices with targeted parameters using large language models (LLMs). Our results demonstrate comparable performance to traditional machine learning methods without feature selection, highlighting the potential of LLMs to acquire scientific knowledge and design new materials akin to materials scientists.

翻译：数据量在探索前沿材料方面具有越来越重要的意义，人工或自动化方法已经生成了许多数据集。然而，在应用学科中，材料的评估是基于其设备性能而不是其特性，材料科学领域难以有效利用数据的丰富性。本文提出了一种新的自然语言处理（NLP）任务，称为结构化信息推断（SII），以解决材料科学中设备级别信息提取的复杂性。我们通过调整GPT-3中的现有钙钛矿太阳能电池FAIR（可发现，可访问，可互操作，可重复使用）数据集来完成这项任务，获得了91.8％的F1分数，并扩展了自发布以来发表的数据。生成的数据格式化和标准化，使其可以直接用作后续数据分析的输入。此功能使材料科学家可以通过选择其领域内的高质量综述文章来开发模型。此外，我们设计了实验，使用大型语言模型（LLMs）预测太阳能电池的电气性能并设计具有定向参数的材料或设备。我们的结果表明，不需要特征选择，大语言模型具有获得科学知识和设计新材料的潜力，类似于材料科学家。

0

相关内容

大型语言模型

大型语言模型

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】面向计算科学和工程的Python导论，167页pdf

【干货书】面向计算科学和工程的Python导论，167页pdf

专知会员服务

42+阅读 · 2021年4月7日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

专知会员服务

15+阅读 · 2019年12月24日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【IJCAI 2019 | tutorial】材料学与AI AI for Materials Science , Lars Kotthof

【IJCAI 2019 | tutorial】材料学与AI AI for Materials Science , Lars Kotthof

专知会员服务

18+阅读 · 2019年8月12日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

利用核技术分析并构建金属标记富勒烯多功能纳米材料

国家自然科学基金

0+阅读 · 2013年12月31日

高强度金属材料的形变与破坏机理

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体-多金属氧酸盐纳米材料的低热固相合成及电催化性能

国家自然科学基金

0+阅读 · 2013年12月31日

超低温环境下T700/环氧复合材料拉-压疲劳性能及损伤机理

国家自然科学基金

0+阅读 · 2013年12月31日

Heusler结构框架下新型磁电材料开发与单晶多层膜制备研究

国家自然科学基金

0+阅读 · 2012年12月31日

锰系层状钙钛矿复合氧化物设计、制备及磁性能

国家自然科学基金

0+阅读 · 2012年12月31日

LaB6-基稀土六硼化物纳米结构的可控生长及性能

国家自然科学基金

0+阅读 · 2012年12月31日

高压下新型钙钛矿固体电解质材料的制备、结构调控和输运性质

国家自然科学基金

0+阅读 · 2009年12月31日

双向宽频太阳光谱转换纳米材料的可控合成及其应用探索

国家自然科学基金

0+阅读 · 2009年12月31日

新型MgO掺杂透明导电薄膜及其导电机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Arxiv

0+阅读 · 2023年5月29日

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

Arxiv

0+阅读 · 2023年5月29日

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Arxiv

0+阅读 · 2023年5月28日

Towards computing low-makespan solutions for multi-arm multi-task planning problems

Arxiv

0+阅读 · 2023年5月27日

Towards Reasoning in Large Language Models: A Survey

Towards Reasoning in Large Language Models: A Survey

Arxiv

0+阅读 · 2023年5月26日

Large Language Models as Tool Makers

Large Language Models as Tool Makers

Arxiv

1+阅读 · 2023年5月26日

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Arxiv

0+阅读 · 2023年5月25日

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Arxiv

0+阅读 · 2023年5月25日

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps

Arxiv

30+阅读 · 2023年5月12日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

VIP会员

文章信息

相关主题

大型语言模型

钙钛矿太阳能电池

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】面向计算科学和工程的Python导论，167页pdf

【干货书】面向计算科学和工程的Python导论，167页pdf

专知会员服务

42+阅读 · 2021年4月7日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

【MIT】大型元学习数据集（Supplementary Materials for Niseko: a Large-ScaleMeta-Learning Dataset），麻省理工学院博士| Zeyuan Shang

专知会员服务

15+阅读 · 2019年12月24日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【IJCAI 2019 | tutorial】材料学与AI AI for Materials Science , Lars Kotthof

【IJCAI 2019 | tutorial】材料学与AI AI for Materials Science , Lars Kotthof

专知会员服务

18+阅读 · 2019年8月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《2024年度美国防部作战测试与评估报告》500页

《面相未来作战空中系统中有人-无人编组的AI驱动协作模式选择》含slides

无人机编队飞行：复杂环境中作战的策略、挑战与应用

《探索军事背景下共享大语言模型：AI助手与智能体部署中可扩展性与效率的早期洞察》（含44页slides）

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

相关论文

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Arxiv

0+阅读 · 2023年5月29日

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

Arxiv

0+阅读 · 2023年5月29日

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Arxiv

0+阅读 · 2023年5月28日

Towards computing low-makespan solutions for multi-arm multi-task planning problems

Arxiv

0+阅读 · 2023年5月27日

Towards Reasoning in Large Language Models: A Survey

Towards Reasoning in Large Language Models: A Survey

Arxiv

0+阅读 · 2023年5月26日

Large Language Models as Tool Makers

Large Language Models as Tool Makers

Arxiv

1+阅读 · 2023年5月26日

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Arxiv

0+阅读 · 2023年5月25日

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Arxiv

0+阅读 · 2023年5月25日

ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps

Arxiv

30+阅读 · 2023年5月12日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

相关基金

利用核技术分析并构建金属标记富勒烯多功能纳米材料

国家自然科学基金

0+阅读 · 2013年12月31日

高强度金属材料的形变与破坏机理

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体-多金属氧酸盐纳米材料的低热固相合成及电催化性能

国家自然科学基金

0+阅读 · 2013年12月31日

超低温环境下T700/环氧复合材料拉-压疲劳性能及损伤机理

国家自然科学基金

0+阅读 · 2013年12月31日

Heusler结构框架下新型磁电材料开发与单晶多层膜制备研究

国家自然科学基金

0+阅读 · 2012年12月31日

锰系层状钙钛矿复合氧化物设计、制备及磁性能

国家自然科学基金

0+阅读 · 2012年12月31日

LaB6-基稀土六硼化物纳米结构的可控生长及性能

国家自然科学基金

0+阅读 · 2012年12月31日

高压下新型钙钛矿固体电解质材料的制备、结构调控和输运性质

国家自然科学基金

0+阅读 · 2009年12月31日

双向宽频太阳光谱转换纳米材料的可控合成及其应用探索

国家自然科学基金

0+阅读 · 2009年12月31日

新型MgO掺杂透明导电薄膜及其导电机理的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员