Rapid progress in representation learning has led to a proliferation of embedding models, and to associated challenges of model selection and practical application. It is non-trivial to assess a model's generalizability to new, candidate datasets and failure to generalize may lead to poor performance on downstream tasks. Distribution shifts are one cause of reduced generalizability, and are often difficult to detect in practice. In this paper, we use the embedding space geometry to propose a non-parametric framework for detecting distribution shifts, and specify two tests. The first test detects shifts by establishing a robustness boundary, determined by an intelligible performance criterion, for comparing reference and candidate datasets. The second test detects shifts by featurizing and classifying multiple subsamples of two datasets as in-distribution and out-of-distribution. In evaluation, both tests detect model-impacting distribution shifts, in various shift scenarios, for both synthetic and real-world datasets.
翻译:代表性学习的迅速进展已导致嵌入模型的扩散,并导致模型选择和实际应用的相关挑战。评估模型对新的候选数据集的通用性以及未能概括性可能会导致下游任务业绩不佳。分布变化是一般性降低的一个原因,而且在实践中往往难以检测。在本文件中,我们使用嵌入空间几何方法提出一个非参数框架来探测分布变化,并指定了两个测试。第一个测试通过建立可靠度边界来检测变化,该边界由可理解的性能标准决定,用于比较参考数据集和候选数据集。第二个测试检测了两种数据集在分布和分配外的多重子抽样的形成和分类。在评估中,两个测试都检测了合成数据集和真实世界数据集在各种变位情景下影响分布变化的模式。