With the recent surge in big data analytics for hyper-dimensional data there is a renewed interest in dimensionality reduction techniques for machine learning applications. In order for these methods to improve performance gains and understanding of the underlying data, a proper metric needs to be identified. This step is often overlooked and metrics are typically chosen without consideration of the underlying geometry of the data. In this paper, we present a method for incorporating elastic metrics into the t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). We apply our method to functional data, which is uniquely characterized by rotations, parameterization, and scale. If these properties are ignored, they can lead to incorrect analysis and poor classification performance. Through our method we demonstrate improved performance on shape identification tasks for three benchmark data sets (MPEG-7, Car data set, and Plane data set of Thankoor), where we achieve 0.77, 0.95, and 1.00 F1 score, respectively.
翻译:随着最近对高维数据的海量数据分析的激增,人们对机器学习应用的维度减少技术重新产生了兴趣。为了使这些方法能够提高绩效收益和对基础数据的理解,需要确定适当的衡量尺度。这一步骤往往被忽视,而且通常在不考虑数据的基本几何学的情况下选择衡量尺度。在本文中,我们提出了一个将弹性测量尺度纳入多分布式相邻模拟(t-SNE)和统一曼氏度量和投影(UMAP)的方法。我们运用了我们的方法来计算功能数据,而功能数据的独特特征是旋转、参数化和尺度。如果这些特性被忽视,它们可能导致不正确的分析和分类性能差。我们通过我们的方法展示了三个基准数据集(MPEG-7、汽车数据集和感恩球数据集)在形状识别任务方面的改进性能,我们分别取得了0.77分、0.95分和1.00F1分。