This research explores temporal concept drift and temporal alignment in knowledge organization systems (KOS). A comparative analysis is pursued using the 1910 Library of Congress Subject Headings, 2020 FAST Topical, and automatic indexing. The use case involves a sample of 90 nineteenth-century Encyclopedia Britannica entries. The entries were indexed using two approaches: 1) full-text indexing; 2) Named Entity Recognition was performed upon the entries with Stanza, Stanford's NLP toolkit, and entities were automatically indexed with the Helping Interdisciplinary Vocabulary application (HIVE), using both 1910 LCSH and FAST Topical. The analysis focused on three goals: 1) identifying results that were exclusive to the 1910 LCSH output; 2) identifying terms in the exclusive set that have been deprecated from the contemporary LCSH, demonstrating temporal concept drift; and 3) exploring the historical significance of these deprecated terms. Results confirm that historical vocabularies can be used to generate anachronistic subject headings representing conceptual drift across time in KOS and historical resources. A methodological contribution is made demonstrating how to study changes in KOS over time and improve the contextualization of historical humanities resources.
翻译:这项研究探索知识组织系统的时间概念漂移和时间调整。利用1910年大会主题标题图书馆、2020年FAST专题研究和自动索引,进行了比较分析。使用案例涉及90个十九世纪百科全书英国英国分会条目的抽样,这些条目采用两种方法编制索引:1)全文索引;2)在斯坦福NLP工具包 Stanza的条目上进行了命名实体识别;以及利用1910年LCSH和FAST专题,将各实体与帮助跨学科词汇应用(HIVE)自动编制索引。分析侧重于三个目标:1) 查明1910年LCSH产出所独有的成果;2) 确定独家成套术语中从当代LCSH中去除的术语,以显示时间概念的漂移;3) 探讨这些过时术语的历史意义。结果证实,历史词汇可用于生成反映KOS和历史资源不同时期概念漂移的过时主题标题。方法贡献表明如何研究KOS历史资源的变化,并改进背景化。