Embedding methods for product spaces are powerful techniques for low-distortion and low-dimensional representation of complex data structures. Nevertheless, little is known regarding downstream learning and optimization problems in such spaces. Here, we address the problem of linear classification in a product space form -- a mix of Euclidean, spherical, and hyperbolic spaces. First, we describe new formulations for linear classifiers on a Riemannian manifold using geodesics and Riemannian metrics which generalize straight lines and inner products in vector spaces, respectively. Second, we prove that linear classifiers in $d$-dimensional space forms of any curvature have the same expressive power, i.e., they can shatter exactly $d+1$ points. Third, we formalize linear classifiers in product space forms, describe the first corresponding perceptron and SVM classification algorithms, and establish rigorous convergence results for the former. We support our theoretical findings with simulation results on several datasets, including synthetic data, CIFAR-100, MNIST, Omniglot, and single-cell RNA sequencing data. The results show that learning methods applied to small-dimensional embeddings in product space forms outperform their algorithmic counterparts in each space form.
翻译:产品空间的嵌入方法是复杂数据结构的低扭曲和低维代表的强大技术。 然而,对于此类空间的下游学习和优化问题,人们对此知之甚少。在这里,我们处理产品空间形式的线性分类问题 -- -- 一种由Euclidean、球形和双曲体空间混合体组成的产品空间。首先,我们用大地测量学和Riemannian测量仪分别对矢量空间的直线和内产物进行概括化的里伊曼方方元的线性分类方法描述新配方。第二,我们证明任何曲线空间形式的美元-维空间形式的线性分类师具有相同的显性能量,即它们可以完全折合$d+1美元。第三,我们将产品空间形式的线性分类师正式化,描述第一个对应的感官和SVM分类算法,并为前者建立严格的趋同结果。我们支持我们的理论结论,对若干数据集进行模拟,包括合成数据、CIFAR-100、MNIST、Omniglot和单细胞RNA测序数据的模拟结果。结果显示,每个空间形式的空间数据形式都以空间形式对等式进行学习。