非线性,通过 " 拉索惩罚性自动编码器 " 路径减少偏小尺寸 (Non-linear, Sparse Dimensionality Reduction via Path Lasso Penalized Autoencoders)

High-dimensional data sets are often analyzed and explored via the construction of a latent low-dimensional space which enables convenient visualization and efficient predictive modeling or clustering. For complex data structures, linear dimensionality reduction techniques like PCA may not be sufficiently flexible to enable low-dimensional representation. Non-linear dimension reduction techniques, like kernel PCA and autoencoders, suffer from loss of interpretability since each latent variable is dependent of all input dimensions. To address this limitation, we here present path lasso penalized autoencoders. This structured regularization enhances interpretability by penalizing each path through the encoder from an input to a latent variable, thus restricting how many input variables are represented in each latent dimension. Our algorithm uses a group lasso penalty and non-negative matrix factorization to construct a sparse, non-linear latent representation. We compare the path lasso regularized autoencoder to PCA, sparse PCA, autoencoders and sparse autoencoders on real and simulated data sets. We show that the algorithm exhibits much lower reconstruction errors than sparse PCA and parameter-wise lasso regularized autoencoders for low-dimensional representations. Moreover, path lasso representations provide a more accurate reconstruction match, i.e. preserved relative distance between objects in the original and reconstructed spaces.

翻译：高维数据集往往会通过建造潜伏低维空间来分析和探索,这种潜伏低维空间可以方便地视觉化和高效地预测模型或集群。对于复杂的数据结构来说,像五氯苯甲醚这样的线性维度减少技术可能不够灵活,无法进行低维代表。非线性减少技术,如五氯苯甲醚和自动编码器等,由于每个潜伏变量取决于所有输入维度,因此会丧失可解释性。为了应对这一限制,我们在这里展示路径 lasso 惩罚自动编码器。这种结构化的正规化加强了可解释性,通过从输入到潜伏变量的编码器对每一个路径进行处罚,从而限制每个潜伏维度维度中有多少输入变量。我们的算法使用一个组的 lasso 处罚和非负维度矩阵因子化来构建一个稀薄的非线性、非线性潜伏性代表器。我们比较了每个潜在变量的正统化自动编码器与五氯苯甲醚、自定义的自动编码器和稀薄的自动编码器化的自动编码器。我们显示的重建错误要小得多于稀薄的五氯苯和参数的拉索常规化的硬度变量,在低维度空间的物体的相对的构造中提供。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

专知会员服务

39+阅读 · 2020年11月3日

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【ACL 2020】低维双曲知识图谱嵌入，Low-Dimensional Hyperbolic Knowledge Graph Embeddings

专知会员服务

77+阅读 · 2020年6月14日

专知会员服务