Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.
翻译:数据不仅在社会上无处不在,而且在大小和维度方面都日益复杂。 尺寸减少使研究人员和学者有能力使如此复杂、高维的数据空间更简单、更便于管理。 本元素为读者提供了一套现代不受监督的维度减少技术以及数百行R代码,以便在一个简化的、低维的子空间中有效地代表原始的高维数据空间。 从最早的维度减少技术主要组成部分分析开始,并使用真正的社会科学数据,我通过应用下列技术来介绍和引导读者:本地线性嵌入、多分布式的相邻嵌(t-SNE)、统一的多维近和投影、自制地图和深度自动编码。结果是一个储存完善的、不受监督的算法工具箱,用于处理现代社会中常见的高维数据的复杂性。所有代码都可以在Githhub上公开查阅。