The ML community recognizes the importance of anticipating and mitigating the potential negative impacts of benchmark research. In this position paper, we argue that more attention needs to be paid to areas of ethical risk that lie at the technical and scientific core of ML benchmarks. We identify overlooked structural similarities between human IQ and ML benchmarks. Human intelligence and ML benchmarks share similarities in setting standards for describing, evaluating and comparing performance on tasks relevant to intelligence. This enables us to unlock lessons from feminist philosophy of science scholarship that need to be considered by the ML benchmark community. Finally, we outline practical recommendations for benchmark research ethics and ethics review.
翻译:在这份立场文件中,我们认为,需要更多地注意处于最低限值基准的技术和科学核心的道德风险领域。我们发现人类智商基准和最低限值基准之间被忽视的结构相似性。人类情报和最低限值基准在为描述、评价和比较与情报有关的任务的业绩制定标准方面有着相似性。这使我们能够从女权主义科学奖学金哲学中吸取需要最低限值基准群体考虑的经验教训。最后,我们为基准研究道德和道德审查提出切实可行的建议。