Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and benefit from projection onto lower-dimensional Stiefel and Grassmannian manifolds. In this work, we propose an algorithm called \textit{Principal Stiefel Coordinates (PSC)} to reduce data dimensionality from $ V_k(\mathbb{R}^N)$ to $V_k(\mathbb{R}^n)$ in an \textit{$O(k)$-equivariant} manner ($k \leq n \ll N$). We begin by observing that each element $\alpha \in V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into $V_k(\mathbb{R}^N)$. Next, we describe two ways of finding a suitable embedding map $\alpha$: one via an extension of principal component analysis ($\alpha_{PCA}$), and one that further minimizes data fit error using gradient descent ($\alpha_{GD}$). Then, we define a continuous and $O(k)$-equivariant map $\pi_\alpha$ that acts as a "closest point operator" to project the data onto the image of $V_k(\mathbb{R}^n)$ in $V_k(\mathbb{R}^N)$ under the embedding determined by $\alpha$, while minimizing distortion. Because this dimensionality reduction is $O(k)$-equivariant, these results extend to Grassmannian manifolds as well. Lastly, we show that $\pi_{\alpha_{PCA}}$ globally minimizes projection error in a noiseless setting, while $\pi_{\alpha_{GD}}$ achieves a meaningfully different and improved outcome when the data does not lie exactly on the image of a linearly embedded lower-dimensional Stiefel manifold as above. Multiple numerical experiments using synthetic and real-world data are performed.
翻译:暂无翻译