This study proposes median consensus embedding (MCE) to address variability in low-dimensional embeddings caused by random initialization in nonlinear dimensionality reduction techniques such as $t$-distributed stochastic neighbor embedding. MCE is defined as the geometric median of multiple embeddings. By assuming multiple embeddings as independent and identically distributed random samples and applying large deviation theory, we prove that MCE achieves consistency at an exponential rate. Furthermore, we develop a practical algorithm to implement MCE by constructing a distance function between embeddings based on the Frobenius norm of the pairwise distance matrix of data points. Application to actual data demonstrates that MCE converges rapidly and effectively reduces instability. We further combine MCE with multiple imputation to address missing values and consider multiscale hyperparameters. Results confirm that MCE effectively mitigates instability issues in embedding methods arising from random initialization and other sources.
翻译:本研究提出中位数共识嵌入(MCE)方法,以解决非线性降维技术(如$t$-分布随机邻域嵌入)中因随机初始化导致的低维嵌入变异性问题。MCE定义为多个嵌入的几何中位数。通过假设多个嵌入为独立同分布的随机样本并应用大偏差理论,我们证明MCE能以指数速率达到一致性。此外,我们基于数据点成对距离矩阵的Frobenius范数构建嵌入间距离函数,开发了实现MCE的实用算法。实际数据应用表明,MCE能快速收敛并有效降低不稳定性。我们进一步将MCE与多重插补方法结合以处理缺失值,并考虑多尺度超参数。结果证实,MCE能有效缓解嵌入方法中因随机初始化及其他因素引起的不稳定性问题。