Colonial archives are at the center of increased interest from a variety of perspectives, as they contain traces of historically marginalized people. Unfortunately, like most archives, they remain difficult to access due to significant persisting barriers. We focus here on one of them: the biases to be found in historical findings aids, such as indexes of person names, which remain in use to this day. In colonial archives, indexes can perpetuate silences by omitting to include mentions of historically marginalized persons. In order to overcome such limitations and pluralize the scope of existing finding aids, we propose using automated entity recognition. To this end, we contribute a fit-for-purpose annotation typology and apply it on the colonial archive of the Dutch East India Company (VOC). We release a corpus of nearly 70,000 annotations as a shared task, for which we provide baselines using state-of-the-art neural network models. Our work intends to stimulate further contributions in the direction of broadening access to (colonial) archives, integrating automation as a possible means to this end.
翻译:从各种角度看,殖民档案是人们日益感兴趣的焦点,因为它们含有历史上被边缘化者的痕迹。不幸的是,与大多数档案一样,由于长期存在的重大障碍,它们仍然难以查阅。我们在此集中关注其中之一:历史发现援助中发现的偏见,例如至今仍在使用的个人名称索引。在殖民档案中,指数可以通过不提及历史上被边缘化的人而使沉默永久化。为了克服这些限制并使现有查找援助的范围多元化,我们提议使用自动化实体的识别。为此,我们提供了一种适合目的的注释类型,并将其应用于荷兰东印度公司的殖民档案。我们发布了一套近70 000个说明,作为共同任务,为此我们使用最先进的神经网络模型提供基线。我们的工作旨在进一步推动扩大获得(殖民)档案的途径,将自动化作为实现这一目标的可能手段。