Embedding is a common technique for analyzing multi-dimensional data. However, the embedding projection cannot always form significant and interpretable visual structures that foreshadow underlying data patterns. We propose an approach that incorporates human knowledge into data embeddings to improve pattern significance and interpretability. The core idea is (1) externalizing tacit human knowledge as explicit sample labels and (2) adding a classification loss in the embedding network to encode samples' classes. The approach pulls samples of the same class with similar data features closer in the projection, leading to more compact (significant) and class-consistent (interpretable) visual structures. We give an embedding network with a customized classification loss to implement the idea and integrate the network into a visualization system to form a workflow that supports flexible class creation and pattern exploration. Patterns found on open datasets in case studies, subjects' performance in a user study, and quantitative experiment results illustrate the general usability and effectiveness of the approach.
翻译:嵌入式是分析多维数据的一种常见技术。然而,嵌入式投影不一定总能形成重要和可解释的视觉结构,从而预示潜在的数据模式。我们提出一种方法,将人类知识纳入数据嵌入中,以提高模式意义和可解释性。核心想法是:(1) 将隐性人类知识外部化,作为清晰的样本标签,(2) 在嵌入网络中增加分类损失,以编码样本类别。该方法将同一类的样本与类似数据特征的投影更加接近,导致更紧凑(重大)和类一致(可解释)的视觉结构。我们给一个带有定制分类损失的网络嵌入式网络,以实施该理念,并将网络整合到可视化系统中,形成一个支持灵活分类创建和模式探索的工作流程。在案例研究中发现的公开数据集模式、用户研究中主体的绩效以及定量实验结果显示了该方法的通用性和有效性。