In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like Schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
翻译:在一个完美的世界中, 所有文章都始终包含足够的元数据来描述资源。 我们知道这不是现实, 因此我们有动力来调查当作者和出版商自己供应时出现的元数据的演变。 因为应用元数据需要时间, 我们认识到每个新闻文章作者都有有限的元数据预算来花时间和精力。 他们是如何使用这一预算的? 使用哪些最高元数据类别? 它们是如何增长的? 它们的作用是什么? 它们是如何随着时间增长的? 我们还认识到并非所有元数据字段都得到同等使用。 单个域的增长是多少? 哪个域是采用得最快的? 哪个域? 在本文中,我们审查了1998年至2016年期间由因特网档案馆获取的29个单位提供的227 726 HTML新闻文章。 在审查每篇文章中的元数据领域后,我们发现2010年开始元数据复兴是因为出版商采用了元数据,用于改进搜索引擎排名、搜索引擎跟踪、社交媒体跟踪以及社交媒体共享。 当分析单个域时,我们发现一个应用元数据字段比其他所有领域都要多的时候, 我们发现一个应用的是社交卡 -- 平台生成的卡, 当一个共享一个URL时, 第一次使用时, URL 。 一旦在2010年的元数据标准中, 其核心数据标准在2010年被建立到2010年的域中, 将超过95 。