In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper show a distinguishable diversity in the retrievability scores among the documents of different types.
翻译:在本文中,我们研究了在现实数字图书馆(DL)中数据集和出版物的可检索性。可检索性衡量标准最初是用来量化检索系统对获取信息的影响的。可检索性还使DL工程师能够评估其搜索引擎,以确定检索收藏内容的方便程度。根据这一方法,我们在研究报告中提出了研究数据集和出版物检索的面向系统的方法。本文的一个特点是侧重于衡量各类DL项目的可检索性偏差,包括实用度。除其他指标外,我们使用Lorenz曲线和基尼系数来直观两种可检索文件类型(具体地说数据集和出版物)的差异。在论文中报告的精神结果显示不同类型文件的可检索性分数存在明显差异。