We present the first full description of Media Cloud, an open source platform based on crawling hyperlink structure in operation for over 10 years, that for many uses will be the best way to collect data for studying the media ecosystem on the open web. We document the key choices behind what data Media Cloud collects and stores, how it processes and organizes these data, and open API access as well as user-facing tools. We also highlight the strengths and limitations of the Media Cloud collection strategy compared to relevant alternatives. We give an overview two sample datasets generated using Media Cloud and discuss how researchers can use the platform to create their own datasets.
翻译:我们首先完整地描述媒体云,这是一个基于超链接结构的开放源平台,运作了10多年,许多用途都是收集用于在开放的网络上研究媒体生态系统的数据的最佳方法。 我们记录了媒体云收集和储存哪些数据、这些数据如何处理和组织、开放的API访问以及用户定位工具背后的关键选择。 我们还强调了媒体云收集战略相对于相关替代工具的长处和局限性。 我们概述了使用媒体云生成的两个抽样数据集,并讨论了研究人员如何利用该平台创建自己的数据集。