通过子集嵌入探索多维数据 (Exploring Multi-dimensional Data via Subset Embedding)

Multi-dimensional data exploration is a classic research topic in visualization. Most existing approaches are designed for identifying record patterns in dimensional space or subspace. In this paper, we propose a visual analytics approach to exploring subset patterns. The core of the approach is a subset embedding network (SEN) that represents a group of subsets as uniformly-formatted embeddings. We implement the SEN as multiple subnets with separate loss functions. The design enables to handle arbitrary subsets and capture the similarity of subsets on single features, thus achieving accurate pattern exploration, which in most cases is searching for subsets having similar values on few features. Moreover, each subnet is a fully-connected neural network with one hidden layer. The simple structure brings high training efficiency. We integrate the SEN into a visualization system that achieves a 3-step workflow. Specifically, analysts (1) partition the given dataset into subsets, (2) select portions in a projected latent space created using the SEN, and (3) determine the existence of patterns within selected subsets. Generally, the system combines visualizations, interactions, automatic methods, and quantitative measures to balance the exploration flexibility and operation efficiency, and improve the interpretability and faithfulness of the identified patterns. Case studies and quantitative experiments on multiple open datasets demonstrate the general applicability and effectiveness of our approach.

翻译：多维数据探索是一个典型的可视化研究专题。多数现有方法都是用来识别维空间或子空间记录模式的典型研究课题。在本文件中,我们提出了探索子空间或子空间记录模式的视觉分析方法。方法的核心是一个子嵌入网络( SEN), 代表一组子集, 作为统一格式的嵌入器。我们把SEN作为多个子网, 具有不同的损失功能。设计能够处理任意子集, 捕捉单一特征子集的相似性, 从而实现精确的图案探索, 在大多数情况下, 正在寻找几个特征上具有类似值的子集。此外, 每个子网都是一个隐藏层的完全连接的神经网络。简单的结构带来高培训效率。我们将SEN整合到一个直观化系统中, 实现三步式的嵌入。具体地说, 我们的分析师(1) 将给定数据集分成子网, (2) 在使用 SEN 创建的预测的潜层空间中选择部分, 以及(3) 确定特定子集内是否存在模式。一般来说, 系统将视觉化、互动、自动方法和量化措施结合了子网,, 来平衡我们探索的可视性和操作性、和量化的多级性的模型。