We discuss probabilistic neural network models for unsupervised learning where the distribution of the hidden layer is fixed. We argue that learning machines with this architecture enjoy a number of desirable properties. For example, the model can be chosen as a simple and interpretable one, it does not need to be over-parametrised and training is argued to be efficient in a thermodynamic sense. When hidden units are binary variables, these models have a natural interpretation in terms of features. We show that the featureless state corresponds to a state of maximal ignorance about the features and that learning the first feature depends on non-Gaussian statistical properties of the data. We suggest that the distribution of hidden variables should be chosen according to the principle of maximal relevance. We introduce the Hierarchical Feature Model as an example of a model that satisfies this principle, and that encodes an a priori organisation of the feature space. We present extensive numerical experiments in order i) to test that the internal representation of learning machines can indeed be independent of the data with which they are trained and ii) that only a finite number of features are needed to describe a datasets.
翻译:我们讨论了在隐性层分布固定的地方进行不受监督的学习的概率性神经网络模型。我们争辩说,具有这一结构的学习机器具有一些可取的特性。例如,可以选择该模型为简单和可解释的模型,不需要过于偏差,培训在热力学意义上是有效的。当隐藏的单元是二进制变量时,这些模型具有自然的特性解释。我们表明,没有特色的状态与对特征最不了解的状态相对应,而学习第一个特征取决于数据的非高加索统计特性。我们建议,隐藏变量的分布应该根据最大相关性的原则来选择。我们采用“等级特性模型”作为符合这一原则的模型的一个范例,并且将地物空间的先前组织编码成一个编码。我们进行了广泛的数字实验,以便测试学习机器的内部表现是否确实独立于它们所培训的数据,并且(二)只需要一定数量的特征来描述数据集。