媒体云:开放网站上全球新闻的大规模开放源码集 (Media Cloud: Massive Open Source Collection of Global News on the Open Web)

Hal Roberts,Rahul Bhargava,Linas Valiukas,Dennis Jen,Momin M. Malik,Cindy Bishop,Emily Ndulue,Aashka Dave,Justin Clark,Bruce Etling,Rob Faris,Anushka Shah,Jasmin Rubinovitz,Alexis Hope,Catherine D'Ignazio,Fernando Bermejo,Yochai Benkler,Ethan Zuckerman

from arxiv, 13 pages, 9 figures, accepted for publication and forthcoming in Proceedings of the Fifteenth International AAAI Conference on Web and Social Media (ICWSM-2021). This adds a 1-page appendix

We present the first full description of Media Cloud, an open source platform based on crawling hyperlink structure in operation for over 10 years, that for many uses will be the best way to collect data for studying the media ecosystem on the open web. We document the key choices behind what data Media Cloud collects and stores, how it processes and organizes these data, and open API access as well as user-facing tools. We also highlight the strengths and limitations of the Media Cloud collection strategy compared to relevant alternatives. We give an overview two sample datasets generated using Media Cloud and discuss how researchers can use the platform to create their own datasets.

翻译：我们首先完整地描述媒体云,这是一个基于超链接结构的开放源平台,运作了10多年,许多用途都是收集用于在开放的网络上研究媒体生态系统的数据的最佳方法。我们记录了媒体云收集和储存哪些数据、这些数据如何处理和组织、开放的API访问以及用户定位工具背后的关键选择。我们还强调了媒体云收集战略相对于相关替代工具的长处和局限性。我们概述了使用媒体云生成的两个抽样数据集,并讨论了研究人员如何利用该平台创建自己的数据集。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日