Many high-dimensional and large-volume data sets of practical relevance have hierarchical structures induced by trees, graphs or time series. Such data sets are hard to process in Euclidean spaces and one often seeks low-dimensional embeddings in other space forms to perform required learning tasks. For hierarchical data, the space of choice is a hyperbolic space since it guarantees low-distortion embeddings for tree-like structures. Unfortunately, the geometry of hyperbolic spaces has properties not encountered in Euclidean spaces that pose challenges when trying to rigorously analyze algorithmic solutions. Here, for the first time, we establish a unified framework for learning scalable and simple hyperbolic linear classifiers with provable performance guarantees. The gist of our approach is to focus on Poincar\'e ball models and formulate the classification problems using tangent space formalisms. Our results include a new hyperbolic and second-order perceptron algorithm as well as an efficient and highly accurate convex optimization setup for hyperbolic support vector machine classifiers. All algorithms provably converge and are highly scalable as they have complexities comparable to those of their Euclidean counterparts. Their performance accuracies on synthetic data sets comprising millions of points, as well as on complex real-world data sets such as single-cell RNA-seq expression measurements, CIFAR10, Fashion-MNIST and mini-ImageNet.
翻译:与实际相关的许多高维和大容量数据集都有由树木、图表或时间序列引发的等级结构。这类数据集很难在欧clidean空间中处理,而且往往寻求在其他空间形式中的低维嵌入,以完成所需的学习任务。对于等级数据而言,选择空间是一个双曲空间,因为它保证了树类结构的低扭曲嵌入。不幸的是,在欧clidean空间中,双曲空间的几何特征没有遇到对严格分析算法解决方案构成挑战的特性。在这里,我们首次建立了一个统一的框架,用于学习可缩放和简单的超单向线性线性分解器,并有可变的性能保证。我们的方法是侧重于Poincar\'e球模型,并利用相近的空间形式来制定分类问题。我们的结果包括一个新的超偏向和次等的感官算法,以及一个高效和高度准确的对等量的对等数据优化设置。所有可调和高度可调和可调和高度缩缩化的对准的内向线分级分解器,因为其复杂性能与IMRIM-IM-IM-IM-IM-IM-RQ-RC-RQ-C-S-s-s-s-s-s-s-s-s-s-s-s-s-s-s-comc-s-s-s-comm-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s