While the need for well-trained, fair ML systems is increasing ever more, measuring fairness for modern models and datasets is becoming increasingly difficult as they grow at an unprecedented pace. One key challenge in scaling common fairness metrics to such models and datasets is the requirement of exhaustive ground truth labeling, which cannot always be done. Indeed, this often rules out the application of traditional analysis metrics and systems. At the same time, ML-fairness assessments cannot be made algorithmically, as fairness is a highly subjective matter. Thus, domain experts need to be able to extract and reason about bias throughout models and datasets to make informed decisions. While visual analysis tools are of great help when investigating potential bias in DL models, none of the existing approaches have been designed for the specific tasks and challenges that arise in large label spaces. Addressing the lack of visualization work in this area, we propose guidelines for designing visualizations for such large label spaces, considering both technical and ethical issues. Our proposed visualization approach can be integrated into classical model and data pipelines, and we provide an implementation of our techniques open-sourced as a TensorBoard plug-in. With our approach, different models and datasets for large label spaces can be systematically and visually analyzed and compared to make informed fairness assessments tackling problematic bias.
翻译:虽然对经过良好训练的、公平的 ML 系统的需求正在日益增加,但衡量现代模型和数据集的公平性随着它们以前所未有的速度增长而变得日益困难,衡量现代模型和数据集的公平性随着它们以前所未有的速度增长而变得日益困难。在向这些模型和数据集推广通用的公平度衡量标准方面,一个关键的挑战就是要求详尽无遗的地面真相标签,这并不总是能够做到。事实上,这往往排除了传统分析指标和系统的应用。与此同时,ML 公平性评估不能从逻辑上进行,因为公平性是一个高度主观的问题。因此,域专家需要能够提取和解释所有模型和数据集的偏向性,以便作出知情的决定。在调查DL 模型的潜在偏向性时,视觉分析工具非常有用,但在调查DL 模型中,现有方法中没有一个是针对在大标签空间中出现的具体任务和挑战设计的。解决这一领域缺乏视觉化工作的问题,我们提出了设计大型标签空间可视化的指导方针,同时考虑到技术和伦理问题。我们提出的可视化方法可以纳入古典模型和数据管道,我们提出的技术的公开来源应用,可以作为Tensorboard plass laimal ladeal labal labal 和有系统化的模型。