T-SNE结构工程中数据可视化数据分析算法经验评估 (An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural Engineering)

A fundamental task in machine learning involves visualizing high-dimensional data sets that arise in high-impact application domains. When considering the context of large imbalanced data, this problem becomes much more challenging. In this paper, the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm is used to reduce the dimensions of an earthquake engineering related data set for visualization purposes. Since imbalanced data sets greatly affect the accuracy of classifiers, we employ Synthetic Minority Oversampling Technique (SMOTE) to tackle the imbalanced nature of such data set. We present the result obtained from t-SNE and SMOTE and compare it to the basic approaches with various aspects. Considering four options and six classification algorithms, we show that using t-SNE on the imbalanced data and SMOTE on the training data set, neural network classifiers have promising results without sacrificing accuracy. Hence, we can transform the studied scientific data into a two-dimensional (2D) space, enabling the visualization of the classifier and the resulting decision surface using a 2D plot.

翻译：机器学习的一项根本任务涉及对高影响应用领域产生的高维数据集进行可视化。在考虑大型不平衡数据的背景时,这一问题变得更加棘手。在本文中,为可视化目的,将分散的蒸汽邻居嵌入(t-SNE)算法用于减少地震工程相关数据集的维度。由于不平衡数据集严重影响了分类器的准确性,因此我们使用合成少数群体过度采样技术(SMOTE)来解决这类数据集的不平衡性。我们介绍了从t-SNE和SMOTE获得的结果,并将其与各个方面的基本方法进行比较。考虑到四个选项和六个分类算法,我们表明,在培训数据集上,使用关于不平衡数据的t-SNEE和SMOTE,神经网络分类器在不牺牲准确性的前提下有希望的结果。因此,我们可以将研究过的科学数据转换为二维(2D)空间,使分类器和由此产生的决定表面能够以2D图图进行可视化。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日