通过自动实体识别来统一殖民档案 (Unsilencing Colonial Archives via Automated Entity Recognition)

Colonial archives are at the center of increased interest from a variety of perspectives, as they contain traces of historically marginalized people. Unfortunately, like most archives, they remain difficult to access due to significant persisting barriers. We focus here on one of them: the biases to be found in historical findings aids, such as indexes of person names, which remain in use to this day. In colonial archives, indexes can perpetuate silences by omitting to include mentions of historically marginalized persons. In order to overcome such limitations and pluralize the scope of existing finding aids, we propose using automated entity recognition. To this end, we contribute a fit-for-purpose annotation typology and apply it on the colonial archive of the Dutch East India Company (VOC). We release a corpus of nearly 70,000 annotations as a shared task, for which we provide baselines using state-of-the-art neural network models. Our work intends to stimulate further contributions in the direction of broadening access to (colonial) archives, integrating automation as a possible means to this end.

翻译：从各种角度看,殖民档案是人们日益感兴趣的焦点,因为它们含有历史上被边缘化者的痕迹。不幸的是,与大多数档案一样,由于长期存在的重大障碍,它们仍然难以查阅。我们在此集中关注其中之一:历史发现援助中发现的偏见,例如至今仍在使用的个人名称索引。在殖民档案中,指数可以通过不提及历史上被边缘化的人而使沉默永久化。为了克服这些限制并使现有查找援助的范围多元化,我们提议使用自动化实体的识别。为此,我们提供了一种适合目的的注释类型,并将其应用于荷兰东印度公司的殖民档案。我们发布了一套近70 000个说明,作为共同任务,为此我们使用最先进的神经网络模型提供基线。我们的工作旨在进一步推动扩大获得(殖民)档案的途径,将自动化作为实现这一目标的可能手段。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日