We introduce an algorithm, Cayley transform ellipsoid fitting (CTEF), that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific -- meaning it always returns elliptic solutions -- and can fit arbitrary ellipsoids. It also outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering. Since CTEF captures global curvature, it is able to extract nonlinear features in data that other methods fail to identify. This is illustrated in the context of dimension reduction on human cell cycle data, and in the context of clustering on classical toy examples. In the latter case, CTEF outperforms 10 popular clustering algorithms.
翻译:我们介绍一种算法,Cayley变换椭球拟合(CTEF),它使用Cayley变换来拟合高维噪声数据的椭球。与许多拟合方法不同,CTEF是特定于椭球的 - 意味着它始终返回椭圆解 - 并且可以拟合任意椭球。当数据不均匀地分布在椭球的表面上时,它也优于其他拟合方法。受到机器学习中对可解释性和可重复性方法的呼吁的启示,我们将CTEF应用于降维,数据可视化和聚类。由于CTEF捕获了全局曲率,因此能够提取其他方法无法识别的数据中的非线性特征。这在人体细胞周期数据的降维和经典玩具示例的聚类上得到了说明。在后一种情况下,CTEF优于10种流行的聚类算法。