Algorithmic feature learners provide high-dimensional vector representations for non-matrix structured signals, like images, audio, text, and graphs. Low-dimensional projections derived from these representations can be used to explore variation across collections of these data. However, it is not clear how to assess the uncertainty associated with these projections. We adapt methods developed for bootstrapping principal components analysis to the setting where features are learned from non-matrix data. We empirically compare the derived confidence regions in simulations, varying factors that influence both feature learning and the bootstrap. Approaches are illustrated on spatial proteomic data. Code, data, and trained models are released as an R compendium.
翻译:分析特征学习者对非矩阵结构信号,如图像、音频、文字和图表提供高维矢量表示,从这些表达中得出的低维预测可用于探索这些数据的收集情况的差异,但不清楚如何评估与这些预测有关的不确定性。我们调整了主要部件分析方法,以适应从非矩阵数据中学习特征的设置。我们实证地比较了模拟中产生的信任区域、影响特征学习的不同因素和靴套。对空间蛋白质组数据的方法作了说明。代码、数据和经过培训的模型作为R简编发布。