Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications.
翻译:数据可视化有助于理解由多个变量(也称为功能)所代表数据,这些变量储存在大型矩阵中,个人储存在线条和列中的可变值中。这些数据结构通常称为多维空间。在本文件中,我们介绍了如何利用多层面投影算法的视觉结果来理解和微调其数学框架的参数。这些方法的一些共同数学共通之处是拉普拉西亚矩阵、欧克利安距离、科辛距离以及Kullback-Leeper差异等统计方法,这些方法用于匹配概率分布和缩小维度。数据可视化领域的两个相关算法是 t分布的相近相邻嵌入(t-SNE)和最低方位投影(LSP)。这些算法可以用来理解数函数的若干范围,包括其对数据集的影响。在本文中,对主要构件分析(PCA)背后的数学参数参数参数进行了调整,用于匹配概率分布和减少维度。在数据可视域域域域域中,两个相关算法的两种相关算法是T-S的分布式嵌成属性。在数学应用中,这些解算法背后,这些解算法的数学应用了数学序列中的数学序列中,这些算法是用于数学分析的数学分析过程中的数学应用。