An independent ethical assessment of an artificial intelligence system is an impartial examination of the system's development, deployment, and use in alignment with ethical values. System-level qualitative frameworks that describe high-level requirements and component-level quantitative metrics that measure individual ethical dimensions have been developed over the past few years. However, there exists a gap between the two, which hinders the execution of independent ethical assessments in practice. This study bridges this gap and designs a holistic independent ethical assessment process for a text classification model with a special focus on the task of hate speech detection. The assessment is further augmented with protected attributes mining and counterfactual-based analysis to enhance bias assessment. It covers assessments of technical performance, data bias, embedding bias, classification bias, and interpretability. The proposed process is demonstrated through an assessment of a deep hate speech detection model.
翻译:对人工情报系统进行独立的道德评估是对该系统的开发、部署和使用进行符合道德价值观的公正审查; 过去几年来制定了系统质量框架,说明衡量个人道德层面的高级别要求和组成部分的定量指标; 但是,两者之间存在差距,妨碍在实践中进行独立的道德评估; 弥合这一差距,为文本分类模式设计一个整体独立的道德评估程序,特别侧重于仇恨言论检测任务; 进一步强化评估,通过保护性特征挖掘和反事实分析,加强偏见评估; 评估技术业绩、数据偏差、嵌入偏见、分类偏见和可解释性; 通过评估深层仇恨言论检测模式,说明拟议进程。