人类学中机器学习的使用和滥用 (Use and Misuse of Machine Learning in Anthropology)

Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.

翻译：目前,整个研究界都普遍可以使用这一学科的机器学习(ML),这促使这些新兴数学技术在广泛的学科中大量应用。在本文件中,我们将重点研究一个特定的案例研究:古人类学领域,它寻求根据生物和文化证据理解人类物种的演变;正如我们将要表明的那样,在人类学研究界中,很容易获得ML算法,缺乏正确使用这些算法的专门知识,导致整个文献界出现一些基础错误应用。由此产生的不可靠结果不仅破坏了将ML合法纳入人类学研究的努力,而且有可能对我们的人类进化和行为过去产生错误的理解。本论文的目的是简要介绍在古人类学中应用ML的一些方法;我们还将对与人类学研究界不完全相容的人进行一些基本的ML算法调查,这导致整个文献界出现一系列的错误、错误和违反ML方法的错误。我们讨论的是,在人类学研究界的研究中,往往将数据流化和后期数据解法的解读方法纳入了内部的变化和变化法解释。