Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail in unexpected ways or go entirely unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication between these two groups. We find that, while the two groups share common goals of understanding the data and predictions of the model, friction can stem from unfamiliar terms, metrics, and visualizations - limiting the transfer of knowledge to SMEs and discouraging clarifying questions being asked during presentations. Based on our findings, we derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model. We provide a demonstration of our guidelines in a regression modeling scenario and elicit feedback on their use from subject matter experts. From our demonstration, subject matter experts were more comfortable discussing a model's performance, more aware of the trade-offs for the presented model, and better equipped to assess the model's risks - ultimately informing and contextualizing the model's use beyond text and numbers.
翻译:展示预测模型性能是阻碍数据科学家和科学专家协作的通信瓶颈。仅仅通过准确性和误差度量无法完整地描述模型,无法向科学专家传达模型的风险、优点和局限性,使科学专家难以自信地决定使用模型。因此,模型可能会以意外的方式失败或根本未被使用,因为科学专家会优先选择他们熟悉的,但可能不太好的方法,而忽略了错误的模型。在本文中,我们描述了一项与科学专家和数据科学家合作进行的迭代研究,以了解这两个群体之间沟通中的差距。我们发现,尽管这两个群体共同目标是理解数据和模型的预测结果,但由于不熟悉的术语、度量和可视化方法而产生摩擦,从而限制了科学专家对知识的传递,也使得他们不愿在演示期间提问,这进一步加剧了沟通的不畅。基于我们的研究结果,我们制定了一系列通信准则,使用可视化作为共同的媒介来传达模型的优点和弱点。我们在回归建模场景中演示了我们的准则,并从科学专家那里获取了有关其使用的反馈。通过我们的演示,科学专家更容易讨论模型的性能,更加了解模型的权衡,更好地评估了模型的风险,从而超越了文字和数字,为模型的使用提供了背景和环境。