Multidimensional scaling (MDS) is a widely used approach to representing high-dimensional, dependent data. MDS works by assigning each observation a location on a low-dimensional geometric manifold, with distance on the manifold representing similarity. We propose a Bayesian approach to multidimensional scaling when the low-dimensional manifold is hyperbolic. Using hyperbolic space facilitates representing tree-like structure common in many settings (e.g. text or genetic data with hierarchical structure). A Bayesian approach provides regularization that minimizes the impact of uncertainty or measurement error in the observed data. We also propose a case-control likelihood approximation that allows for efficient sampling from the posterior in larger data settings, reducing computational complexity from approximately $O(n^2)$ to $O(n)$. We evaluate the proposed method against state-of-the-art alternatives using simulations, canonical reference datasets, and human gene expression data.
翻译:多维缩放(MDS)是代表高维、依赖性数据的一种广泛使用的方法。 MDS通过将每个观测点指定在低维几何方块上,距离代表相似性。我们提议在低维方块为双曲线时采用贝叶斯式的多维缩放方法。使用双曲空间便于代表许多环境中常见的类似树的结构(例如,带有等级结构的文本或遗传数据)。一种巴伊斯式方法提供正规化,最大限度地减少所观测数据中不确定性或测量错误的影响。我们还提出一个案例控制可能性近似值,以便能够在较大数据环境中从远方的远方进行高效取样,将计算复杂性从约O(n)2美元降低至O(n)美元。我们用模拟、罐子参考数据集和人类基因表达数据对照最新替代方法评估拟议方法。