MIND - 主流和独立新闻文件公司 (MIND - Mainstream and Independent News Documents Corpus) - 专知论文

会员服务 ·

0

相互独立的 · Performer · Better · Processing（编程语言） · 相似度 ·

2021 年 8 月 13 日

MIND - Mainstream and Independent News Documents Corpus

翻译：MIND - 主流和独立新闻文件公司

Danielle Caled,Paula Carvalho,Mário J. Silva

This paper presents and characterizes MIND, a new Portuguese corpus comprised of different types of articles collected from online mainstream and alternative media sources, over a 10-month period. The articles in the corpus are organized into five collections: facts, opinions, entertainment, satires, and conspiracy theories. Throughout this paper, we explain how the data collection process was conducted, and present a set of linguistic metrics that allow us to perform a preliminary characterization of the texts included in the corpus. Also, we deliver an analysis of the most frequent topics in the corpus, and discuss the main differences and similarities among the collections considered. Finally, we enumerate some tasks and applications that could benefit from this corpus, in particular the ones (in)directly related to misinformation detection. Overall, our contribution of a corpus and initial analysis are designed to support future exploratory news studies, and provide a better insight into misinformation.

翻译：本文介绍并描述由10个月期间从网上主流和替代性媒体来源收集的不同类型文章组成的葡萄牙新文集MIND, 文集中的文章分为五类:事实、意见、娱乐、讽刺和阴谋理论。我们在整个文件中解释了数据收集过程是如何进行的,并提出了一套语言指标,使我们能够对文集中包括的案文进行初步定性。此外,我们分析了文集中最经常出现的主题,并讨论了所考虑的文集之间的主要差异和相似之处。最后,我们列举了本文集中可以受益的一些任务和应用,特别是与发现错误信息直接相关的那些任务和应用。总体而言,我们编写文集和初步分析的目的是支持未来的探索性新闻研究,并更好地了解错误信息。

0

相关内容

相互独立的

相互独立的

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

专知会员服务

10+阅读 · 2019年11月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

已删除

将门创投

3+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content

Arxiv

0+阅读 · 2021年10月8日

A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps

Arxiv

11+阅读 · 2021年6月7日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

A Survey on the Evolution of Stream Processing Systems

A Survey on the Evolution of Stream Processing Systems

Arxiv

9+阅读 · 2020年8月3日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Recent Advances and Challenges in Task-oriented Dialog System

Recent Advances and Challenges in Task-oriented Dialog System

Arxiv

19+阅读 · 2020年3月19日

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Arxiv

16+阅读 · 2020年1月2日

Viscovery: Trend Tracking in Opinion Forums based on Dynamic Topic Models

Arxiv

5+阅读 · 2018年5月1日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

VIP会员

文章信息

相关主题

相互独立的

Processing（编程语言）

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

专知会员服务

10+阅读 · 2019年11月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

已删除

将门创投

3+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content

Arxiv

0+阅读 · 2021年10月8日

A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps

Arxiv

11+阅读 · 2021年6月7日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

A Survey on the Evolution of Stream Processing Systems

A Survey on the Evolution of Stream Processing Systems

Arxiv

9+阅读 · 2020年8月3日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Recent Advances and Challenges in Task-oriented Dialog System

Recent Advances and Challenges in Task-oriented Dialog System

Arxiv

19+阅读 · 2020年3月19日

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Arxiv

16+阅读 · 2020年1月2日

Viscovery: Trend Tracking in Opinion Forums based on Dynamic Topic Models

Arxiv

5+阅读 · 2018年5月1日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

微信扫码咨询专知VIP会员