Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from multiple parameter model. The new methods are available in R, in the tourr package.
翻译:多变量数据往往以线性预测方式可视化,这种预测是主要组成部分分析、线性差异分析和投影追踪等技术产生的。预测的一个问题是,它们模糊分布中心附近的低密度和高密度区域。各部分或切片可以帮助揭示这些数据。本文在投影追踪的广泛工作的基础上,开发了部分追踪方法,以寻找有趣的数据片段。线性预测用于界定参数空间的各部分,并通过比较某一部分内外的观测分布来计算有趣性。通过优化这一指数,有可能揭示洞洞(低密度)或粒子(高密度)等特征。优化将被纳入导游,以使结构搜索能够动态。当数据分布偏离统一或正常时,如在直观探索非线性矩阵和多变量空间的功能时,该方法可能有用。部分追踪的两种应用显示:探索分类模型中的决定界限,以及探索多个参数模型中复杂的不平等条件所诱导引的子空间。新的方法可在旅行包中找到。