Complex machine learning models are deployed in several critical domains including healthcare and autonomous vehicles nowadays, albeit as functional black boxes. Consequently, there has been a recent surge in interpreting decisions of such complex models in order to explain their actions to humans. Models that correspond to human interpretation of a task are more desirable in certain contexts and can help attribute liability, build trust, expose biases and in turn build better models. It is, therefore, crucial to understand how and which models conform to human understanding of tasks. In this paper, we present a large-scale crowdsourcing study that reveals and quantifies the dissonance between human and machine understanding, through the lens of an image classification task. In particular, we seek to answer the following questions: Which (well-performing) complex ML models are closer to humans in their use of features to make accurate predictions? How does task difficulty affect the feature selection capability of machines in comparison to humans? Are humans consistently better at selecting features that make image recognition more accurate? Our findings have important implications on human-machine collaboration, considering that a long term goal in the field of artificial intelligence is to make machines capable of learning and reasoning like humans.
翻译:复杂的机器学习模式现在被部署在几个重要领域,包括保健和自主车辆,尽管是功能黑盒。因此,最近对此类复杂模型的决定的解释激增,以解释对人类的行动。在某些情况下,与对任务的人的解释相对应的模型更可取,在某些情况下,这些模型有助于确定责任归属、建立信任、揭示偏见,进而建立更好的模型。因此,了解哪些模型和哪些模型符合人类对任务的理解至关重要。在本文件中,我们提出大规模众包研究,通过图像分类任务的镜头,揭示和量化人类和机器理解之间的分歧。特别是,我们力求回答以下问题:哪些(良好)复杂的ML模型在使用特征作出准确预测方面更接近人类?任务难度如何影响机器与人类相比的特征选择能力?人类在选择使图像识别更准确的特征方面是否一贯地更好?我们的调查结果对人体机器合作有着重要影响,因为人工智能领域的长期目标是使机器能够学习和思考,例如人类。