Diversity Searcher is a tool originally developed to help analyse diversity in news media texts. It relies on a form of automated content analysis and thus rests on prior assumptions and depends on certain design choices related to diversity and fairness. One such design choice is the external knowledge source(s) used. In this article, we discuss implications that these sources can have on the results of content analysis. We compare two data sources that Diversity Searcher has worked with - DBpedia and Wikidata - with respect to their ontological coverage and diversity, and describe implications for the resulting analyses of text corpora. We describe a case study of the relative over- or under-representation of Belgian political parties between 1990 and 2020 in the English-language DBpedia, the Dutch-language DBpedia, and Wikidata, and highlight the many decisions needed with regard to the design of this data analysis and the assumptions behind it, as well as implications from the results. In particular, we came across a staggering over-representation of the political right in the English-language DBpedia.
翻译:多样性搜索器最初是用来帮助分析新闻媒体文本的多样性的工具,它依靠的是一种自动化内容分析的形式,因此依赖于先前的假设,并取决于与多样性和公平性有关的某些设计选择。这种设计选择之一是所使用的外部知识来源。在文章中,我们讨论了这些来源对内容分析结果可能产生的影响。我们比较了多样性搜索器与DBpedia和Wikigata合作的两个数据来源,即其本体覆盖面和多样性,并描述了对文本公司分析的影响。我们描述了1990年至2020年期间比利时政党在英语DBpedia、荷兰语DBpedia和Wikigata中相对代表过多或代表不足的案例研究,并着重指出了与数据分析设计有关的许多必要决定及其背后的假设,以及结果的影响。特别是,我们发现在英语DBpedia中,政治权利代表比例惊人地过高。