CML-COVID:一个大型COVID-19Twitter数据集,包含隐藏主题、敏感和位置信息 (CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics, Sentiment and Location Information) - 专知论文

会员服务 ·

0

INFORMS · COVID-19 · Twitter · 可理解性 · 数据集 ·

2021 年 1 月 28 日

CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics, Sentiment and Location Information

翻译：CML-COVID:一个大型COVID-19Twitter数据集,包含隐藏主题、敏感和位置信息

Hassan Dashtian,Dhiraj Murthy

from arxiv, 6 pages, 4 figures, and 3 tables

As a platform, Twitter has been a significant public space for discussion related to the COVID-19 pandemic. Public social media platforms such as Twitter represent important sites of engagement regarding the pandemic and these data can be used by research teams for social, health, and other research. Understanding public opinion about COVID-19 and how information diffuses in social media is important for governments and research institutions. Twitter is a ubiquitous public platform and, as such, has tremendous utility for understanding public perceptions, behavior, and attitudes related to COVID-19. In this research, we present CML-COVID, a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals and summarize some of the attributes of these data. These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19. We use topic modeling, sentiment analysis, and descriptive statistics to describe the tweets related to COVID-19 we collected and the geographical location of tweets, where available. We provide information on how to access our tweet dataset (archived using twarc).

翻译：作为平台,Twitter是讨论与COVID-19大流行有关的公共空间,Twitter等公共社交媒体平台代表了有关该流行病的重要接触网站,这些数据可供研究小组用于社会、卫生和其他研究。了解关于COVID-19的公众舆论,以及社交媒体信息传播对政府和研究机构的重要性。Twitter是一个无处不在的公共平台,因此对了解与COVID-19有关的公众认识、行为和态度有很大的用处。在这项研究中,我们介绍了CML-COVID-19的一组数据,即来自5 977 653个独特个人的19 298 967万个Twitter数据集,并总结了这些数据的一些属性。这些推特是在2020年3月至2020年7月使用与COVID-19有关的 Corona病毒、covid和面具等查询术语收集的。我们使用主题模型、情绪分析和描述我们收集的与COVID-19有关的推文的推文和可获取的地理位置。我们提供了关于如何获取我们推特数据的信息(使用twarcs)。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

已删除

将门创投

7+阅读 · 2018年4月18日

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

机器学习研究会

50+阅读 · 2018年2月21日

【推荐】深度学习情感分析综述

【推荐】深度学习情感分析综述

机器学习研究会

58+阅读 · 2018年1月26日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

Arxiv

0+阅读 · 2021年3月23日

A Dataset of State-Censored Tweets

Arxiv

0+阅读 · 2021年3月19日

A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter

Arxiv

4+阅读 · 2018年5月25日

Dynamic and Static Topic Model for Analyzing Time-Series Document Collections

Arxiv

8+阅读 · 2018年5月6日

Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

Arxiv

6+阅读 · 2018年3月18日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Arxiv

3+阅读 · 2018年3月17日

SentiBubbles: Topic Modeling and Sentiment Visualization of Entity-centric Tweets

Arxiv

3+阅读 · 2018年1月23日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

Twitter Sentiment Analysis

Arxiv

5+阅读 · 2015年9月14日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

已删除

将门创投

7+阅读 · 2018年4月18日

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

机器学习研究会

50+阅读 · 2018年2月21日

【推荐】深度学习情感分析综述

【推荐】深度学习情感分析综述

机器学习研究会

58+阅读 · 2018年1月26日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

Arxiv

0+阅读 · 2021年3月23日

A Dataset of State-Censored Tweets

Arxiv

0+阅读 · 2021年3月19日

A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter

Arxiv

4+阅读 · 2018年5月25日

Dynamic and Static Topic Model for Analyzing Time-Series Document Collections

Arxiv

8+阅读 · 2018年5月6日

Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

Arxiv

6+阅读 · 2018年3月18日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Arxiv

3+阅读 · 2018年3月17日

SentiBubbles: Topic Modeling and Sentiment Visualization of Entity-centric Tweets

Arxiv

3+阅读 · 2018年1月23日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

Twitter Sentiment Analysis

Arxiv

5+阅读 · 2015年9月14日

微信扫码咨询专知VIP会员