Within the cultural heritage sector, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in libraries and other cultural heritage institutions at the organizational level, there remains a paucity of guidelines created specifically for practitioners embarking on machine learning projects. The manifold stakes and sensitivities involved in applying machine learning to cultural heritage underscore the importance of developing such guidelines. This paper contributes to this need by formulating a detailed checklist with guiding questions and practices that can be employed while developing a machine learning project that utilizes cultural heritage data. I call the resulting checklist the "Collections as ML Data" checklist, which, when completed, can be published with the deliverables of the project. By surveying existing projects, including my own project, Newspaper Navigator, I justify the "Collections as ML Data" checklist and demonstrate how the formulated guiding questions can be employed and operationalized.
翻译:在文化遗产部门,在对数字收藏应用机器学习技术时,越来越一致地努力考虑关键的社会技术视角。虽然文化遗产界集体发展了一个新兴的工作体系,详细说明了图书馆和其他文化遗产机构在组织一级负责的机器学习业务,但具体为从事机器学习项目的从业人员制定的准则仍然很少。在将机器学习应用到文化遗产方面所涉及的多重利害关系和敏感性突出表明了制定这种准则的重要性。本文件通过拟订一份详细的清单,列出在开发一个利用文化遗产数据的机器学习项目时可以使用的指导性问题和做法,从而帮助满足了这一需要。我将由此产生的清单称为“作为ML数据的聚合物”清单,在完成该清单后,可连同项目的交付品一起出版。通过调查现有项目,包括我自己的项目,即《报纸导航》,我证明“作为ML数据收集物的集合物”清单是合理的,并证明拟定的指导文件是如何被使用和操作的。