While Deep Neural Networks (DNNs) are deriving the major innovations in nearly every field through their powerful automation, we are also witnessing the peril behind automation as a form of bias, such as automated racism, gender bias, and adversarial bias. As the societal impact of DNNs grows, finding an effective way to steer DNNs to align their behavior with the human mental model has become indispensable in realizing fair and accountable models. We propose a novel framework of Interactive Attention Alignment (IAA) that aims at realizing human-steerable Deep Neural Networks (DNNs). IAA leverages DNN model explanation method as an interactive medium that humans can use to unveil the cases of biased model attention and directly adjust the attention. In improving the DNN using human-generated adjusted attention, we introduce GRADIA, a novel computational pipeline that jointly maximizes attention quality and prediction accuracy. We evaluated IAA framework in Study 1 and GRADIA in Study 2 in a gender classification problem. Study 1 found applying IAA can significantly improve the perceived quality of model attention from human eyes. In Study 2, we found using GRADIA can (1) significantly improve the perceived quality of model attention and (2) significantly improve model performance in scenarios where the training samples are limited. We present implications for future interactive user interfaces design towards human-alignable AI.
翻译:虽然深神经网络(DNNs)通过其强大的自动化在几乎每个领域都产生了重大创新,但我们也看到自动化作为一种偏见形式的危险,例如自动种族主义、性别偏见和对抗性偏见。随着DNNs的社会影响不断增长,寻找有效的方式引导DNNs使其行为与人类心理模式相一致,对于实现公平和负责任的模式变得不可或缺。我们提议了一个新的互动关注协调框架(IAA),目的是实现人类可感知的深神经网络(DNNSs) 。国际建筑学会利用DNN示范解释方法作为互动媒介,让人类能够用来揭露有偏向的模式关注案例并直接调整关注。在利用人为的调整关注来改善DNNS的社会影响时,我们引入了GRADIA,这是一个新型的计算管道,共同最大限度地提高关注质量和预测准确性。我们在研究1和GRADIA研究2中评估了一个性别分类问题。研究发现,应用ASA可以大大提高人们从人类眼中看到的模式关注的质量。在研究2中,我们发现,使用GRADIA(GRIDA)可以极大地改进人们对未来设计模型的可理解的质量。我们目前对用户的模型的模拟设计的影响。