Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
翻译:数据流的机体学习在多个领域越来越明显。然而,数据分布的变化往往能够引导机器学习模型做出错误的决定。虽然有自动方法可以探测漂移发生时的发生,但人的分析,通常是由数据科学家进行的人类分析,对于诊断问题的原因和调整系统至关重要。我们提议Data+Shift,这是一个视觉分析工具,用于支持数据科学家调查在发现欺诈时数据特征变化的基本因素。设计要求来自与数据科学家的访谈。数据+Shift与JupyterLab是结合的,可以与其他数据科学工具一起使用。我们验证了我们的方法,在数据科学家使用该工具进行欺诈探测时,我们采用了一个思考式实验。