Of particular interest is to discover useful representations solely from observations in an unsupervised generative manner. However, the question of whether existing normalizing flows provide effective representations for downstream tasks remains mostly unanswered despite their strong ability for sample generation and density estimation. This paper investigates this problem for such a family of generative models that admits exact invertibility. We propose Neural Principal Component Analysis (Neural-PCA) that operates in full dimensionality while capturing principal components in \emph{descending} order. Without exploiting any label information, the principal components recovered store the most informative elements in their \emph{leading} dimensions and leave the negligible in the \emph{trailing} ones, allowing for clear performance improvements of $5\%$-$10\%$ in downstream tasks. Such improvements are empirically found consistent irrespective of the number of latent trailing dimensions dropped. Our work suggests that necessary inductive bias be introduced into generative modelling when representation quality is of interest.
翻译:特别令人感兴趣的是,仅仅从未经监督的基因化观察中发现有用的表述。然而,尽管现有的正常流动是否为下游任务提供了有效的表述方式,尽管这些流动具有很强的样本生成和密度估计能力,但大部分问题仍然没有得到解决。本文件调查了这样一个基因化模型系列的这一问题,这种基因化模型承认了完全可视性。我们提议进行全维化的神经主元组成部分分析,同时捕捉到在\emph{descender}顺序中的主要组成部分。在不利用任何标签信息的情况下,回收的主要组成部分在其\emph{lead}维度中储存了最丰富的信息元素,在下游任务中却忽略了微不足道的内容,从而使得能够明显改进5 ⁇ -10$的绩效。这些改进在经验上是一致的,无论潜在尾随的维度减少多少。我们的工作表明,当代表质量值得注意时,必须在基因化建模中引入必要的感化偏差。