Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure.
翻译:现代机器学习技术通常依赖于复杂、高维的嵌入式,以捕捉数据中的基本结构并改善性能。为了辨别模型缺陷并选择理想的代表性,模型构建者往往需要对多个嵌入空间进行比较,这是一项由少数现有工具支持的具有挑战性的分析性任务。我们首先访谈了9个嵌入专家,以辨别其在分析嵌入空间时所面临的各种挑战和使用的技术。我们从这些观点中了解到,我们开发了一个名为Emblaze的新颖系统,将空间比较嵌入计算笔记本环境中。Emblaze使用一个动画、交互式散射图,配上一个新型的星际轨扩增,以进行视觉比较。它还使用新颖的街区分析和集群程序,以动态方式提出空间之间有有趣变化的一组点。通过与ML专家的一系列案例研究,我们展示与Emblaze的互动比较如何有助于获得嵌入空间结构的新洞察力。