This paper introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.
翻译:本文引入了 ENBComp, 这是一种用于比较两个嵌入器的新型方法, 用来比较两个对象之间的相似性, 如字嵌入和文档嵌入。 我们调查了比较这些嵌入器空间是有用的情景。 我们从这些情景中得出共同的任务, 引入支持这些任务的直观分析方法, 并将它们整合到一个综合系统中。 Comp的中心特征之一是基于测量对象周围本地结构差异的衡量尺度的概览性可视化。 将这些本地指标与嵌入器相比较, 提供了相似性和差异的全球概览。 详细的观点可以比较选定对象周围的本地结构, 并将这些本地信息与全球观点联系起来。 整合和连接所有这些组成部分, Comp支持一系列有助于理解嵌入器空间之间的相似性和差异的分析工作流程。 我们通过在多个使用案例中应用它来评估我们的方法, 包括理解文字矢量嵌入器的差异, 以及理解生成嵌入器的算差异 。