Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 - 专知论文

会员服务 ·

0

MoDELS · 推断 · Performer · Better · 测试数据 ·

2023 年 4 月 28 日

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4

翻译：暂无翻译

Kent K. Chang,Mackenzie Cramer,Sandeep Soni,David Bamman

In this work, we carry out a data archaeology to infer books that are known to ChatGPT and GPT-4 using a name cloze membership inference query. We find that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. The ability of these models to memorize an unknown set of books complicates assessments of measurement validity for cultural analytics by contaminating test data; we show that models perform much better on memorized books than on non-memorized books for downstream tasks. We argue that this supports a case for open models whose training data is known.

翻译：暂无翻译

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

博后招募 | 杜克大学医学院Ethan Fang课题组招募数据科学方向博士后

博后招募 | 杜克大学医学院Ethan Fang课题组招募数据科学方向博士后

PaperWeekly

0+阅读 · 2022年8月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Perilipin-5蛋白调控肝星状细胞激活和高脂饮食性非酒精性脂肪肝的机制

国家自然科学基金

0+阅读 · 2014年12月31日

糖基化终末产物在糖尿病性角膜上皮病变中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

兼具诱导成骨和成血管的钛材表面工程研究

国家自然科学基金

0+阅读 · 2012年12月31日

Galectin-7在哮喘发病中的调控以及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于冷原子干涉的Casimir-Polder效应研究

国家自然科学基金

0+阅读 · 2011年12月31日

Septin4对日本血吸虫肝纤维化中肝星状细胞的作用及机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

几何阻挫体系ATO2中自旋、电荷、轨道序及其相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

离子通道TRPM2在血管壁内膜增生中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

赤水金钗石斛对实验性糖尿病肾病的防治作用

国家自然科学基金

0+阅读 · 2009年12月31日

Can ChatGPT pass the Vietnamese National High School Graduation Examination?

Arxiv

0+阅读 · 2023年6月15日

Do Software Security Practices Yield Fewer Vulnerabilities?

Arxiv

0+阅读 · 2023年6月15日

Neural Fields with Hard Constraints of Arbitrary Differential Order

Arxiv

0+阅读 · 2023年6月15日

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Arxiv

0+阅读 · 2023年6月13日

Exploring User Perspectives on ChatGPT: Applications, Perceptions, and Implications for AI-Integrated Education

Arxiv

0+阅读 · 2023年6月13日

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Arxiv

0+阅读 · 2023年6月13日

Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files

Arxiv

0+阅读 · 2023年6月12日

A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions

Arxiv

54+阅读 · 2023年5月25日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Arxiv

12+阅读 · 2023年4月26日

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

Arxiv

14+阅读 · 2022年3月25日

VIP会员

文章信息

相关主题

相关VIP内容

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

博后招募 | 杜克大学医学院Ethan Fang课题组招募数据科学方向博士后

博后招募 | 杜克大学医学院Ethan Fang课题组招募数据科学方向博士后

PaperWeekly

0+阅读 · 2022年8月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Can ChatGPT pass the Vietnamese National High School Graduation Examination?

Arxiv

0+阅读 · 2023年6月15日

Do Software Security Practices Yield Fewer Vulnerabilities?

Arxiv

0+阅读 · 2023年6月15日

Neural Fields with Hard Constraints of Arbitrary Differential Order

Arxiv

0+阅读 · 2023年6月15日

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Arxiv

0+阅读 · 2023年6月13日

Exploring User Perspectives on ChatGPT: Applications, Perceptions, and Implications for AI-Integrated Education

Arxiv

0+阅读 · 2023年6月13日

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Arxiv

0+阅读 · 2023年6月13日

Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files

Arxiv

0+阅读 · 2023年6月12日

A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions

Arxiv

54+阅读 · 2023年5月25日

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Arxiv

12+阅读 · 2023年4月26日

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

Arxiv

14+阅读 · 2022年3月25日

相关基金

Perilipin-5蛋白调控肝星状细胞激活和高脂饮食性非酒精性脂肪肝的机制

国家自然科学基金

0+阅读 · 2014年12月31日

糖基化终末产物在糖尿病性角膜上皮病变中的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

兼具诱导成骨和成血管的钛材表面工程研究

国家自然科学基金

0+阅读 · 2012年12月31日

Galectin-7在哮喘发病中的调控以及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于冷原子干涉的Casimir-Polder效应研究

国家自然科学基金

0+阅读 · 2011年12月31日

Septin4对日本血吸虫肝纤维化中肝星状细胞的作用及机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

几何阻挫体系ATO2中自旋、电荷、轨道序及其相互作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

离子通道TRPM2在血管壁内膜增生中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

赤水金钗石斛对实验性糖尿病肾病的防治作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员