智能机器学习，中心性，卷积神经网络，相关文献检测，本土人类遗骸遣返 (Informed Machine Learning, Centrality, CNN, Relevant Document Detection, Repatriation of Indigenous Human Remains)

Among the pressing issues facing Australian and other First Nations peoples is the repatriation of the bodily remains of their ancestors, which are currently held in Western scientific institutions. The success of securing the return of these remains to their communities for reburial depends largely on locating information within scientific and other literature published between 1790 and 1970 documenting their theft, donation, sale, or exchange between institutions. This article reports on collaborative research by data scientists and social science researchers in the Research, Reconcile, Renew Network (RRR) to develop and apply text mining techniques to identify this vital information. We describe our work to date on developing a machine learning-based solution to automate the process of finding and semantically analysing relevant texts. Classification models, particularly deep learning-based models, are known to have low accuracy when trained with small amounts of labelled (i.e. relevant/non-relevant) documents. To improve the accuracy of our detection model, we explore the use of an Informed Neural Network (INN) model that describes documentary content using expert-informed contextual knowledge. Only a few labelled documents are used to provide specificity to the model, using conceptually related keywords identified by RRR experts in provenance research. The results confirm the value of using an INN network model for identifying relevant documents related to the investigation of the global commercial trade in Indigenous human remains. Empirical analysis suggests that this INN model can be generalized for use by other researchers in the social sciences and humanities who want to extract relevant information from large textual corpora.

翻译：本文针对澳大利亚以及其他原住民族面临的迫切问题之一——将其祖先的尸骸遣返到其社区中重新安葬的问题展开合作研究。这些尸骸目前被西方科学机构所持有。成功地将这些尸骸归还给其社区、重新安葬的过程，主要取决于在1790年至1970年间发表的科学和其他文献中查找记录其被盗、被捐赠、被出售或被交换的信息。本文报告了Research, Reconcile, Renew Network (RRR)数据科学家和社会科学研究人员的合作研究成果，旨在开发和应用文本挖掘技术来识别这些关键信息。我们描述了我们目前在开发基于机器学习的解决方案来自动化找到和语义分析相关文本的过程中的工作。分类模型，特别是基于深度学习的模型，在使用少量标记的文件（即相关/非相关）进行训练时准确率很低。为了提高我们检测模型的准确性，我们探索了使用Informed Neural Network (INN)模型的可能性，该模型使用专家提供的上下文知识来描述文献内容。仅使用少量由RRR专家在溯源研究中确定的概念相关的关键词来为模型提供特异性。结果证实，使用INN网络模型识别与全球商业贸易中的本土人类遗骸有关的相关文献是非常有价值的。实证分析表明，这种INN模型可以推广用于其他社会科学和人文学科的研究人员，以从大型文本语料库中提取相关信息。