The development of Artificial Intelligence (AI), including AI in Science (AIS), should be done following the principles of responsible AI. Progress in responsible AI is often quantified through evaluation metrics, yet there has been less work on assessing the robustness and reliability of the metrics themselves. We reflect on prior work that examines the robustness of fairness metrics for recommender systems as a type of AI application and summarise their key takeaways into a set of non-exhaustive guidelines for developing reliable metrics of responsible AI. Our guidelines apply to a broad spectrum of AI applications, including AIS.
翻译:人工智能(AI)的发展,包括科学人工智能(AIS),应遵循责任人工智能的原则。责任人工智能的进展常通过评估指标进行量化,但针对指标本身稳健性与可靠性的评估研究相对较少。本文回顾了先前关于推荐系统(作为一类AI应用)公平性指标稳健性的研究工作,并将其核心发现归纳为一套非穷尽性的指导原则,用于开发可靠的责任人工智能评估指标。这些指导原则适用于包括AIS在内的广泛AI应用领域。