Delving into ChatGPT usage in academic writing through excess vocabulary - 专知论文

会员服务 ·

0

词表 · PubMed · 大语言模型 · MoDELS · ChatGPT ·

Delving into ChatGPT usage in academic writing through excess vocabulary

翻译：暂无翻译

Dmitry Kobak,Rita González-Márquez,Emőke-Ágnes Horvát,Jan Lause

from arxiv, v2: Updating dataset, figures and numbers to include all PubMed abstracts until end of June 2024

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.

翻译：暂无翻译

0

相关内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

64+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

78+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

144+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

28+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

151+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

137+阅读 · 2019年9月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

32+阅读 · 2019年4月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

31+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

语义Web知识库补全关键技术研究

国家自然科学基金

11+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

2+阅读 · 2016年12月31日

第七届全国数学文化论坛

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

37+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

1+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

4+阅读 · 2014年12月31日

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base

Arxiv

33+阅读 · 2022年7月28日

A psychological theory of explainability

Arxiv

16+阅读 · 2022年5月17日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Arxiv

17+阅读 · 2019年11月6日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

64+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

78+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

144+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

28+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

151+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

174+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

137+阅读 · 2019年9月24日

热门VIP内容

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

27+阅读 · 2019年5月18日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

32+阅读 · 2019年4月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

31+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

14+阅读 · 2018年5月29日

相关论文

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base

Arxiv

33+阅读 · 2022年7月28日

A psychological theory of explainability

Arxiv

16+阅读 · 2022年5月17日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

ResMLP: Feedforward networks for image classification with data-efficient training

Arxiv

12+阅读 · 2021年5月7日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Arxiv

17+阅读 · 2019年11月6日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

11+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

2+阅读 · 2016年12月31日

第七届全国数学文化论坛

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

37+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

1+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

4+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员