Domain divergence plays a significant role in estimating the performance of a model in new domains. While there is a significant literature on divergence measures, researchers find it hard to choose an appropriate divergence for a given NLP application. We address this shortcoming by both surveying the literature and through an empirical study. We develop a taxonomy of divergence measures consisting of three classes -- Information-theoretic, Geometric, and Higher-order measures and identify the relationships between them. Further, to understand the common use-cases of these measures, we recognise three novel applications -- 1) Data Selection, 2) Learning Representation, and 3) Decisions in the Wild -- and use it to organise our literature. From this, we identify that Information-theoretic measures are prevalent for 1) and 3), and Higher-order measures are more common for 2). To further help researchers choose appropriate measures to predict drop in performance -- an important aspect of Decisions in the Wild, we perform correlation analysis spanning 130 domain adaptation scenarios, 3 varied NLP tasks and 12 divergence measures identified from our survey. To calculate these divergences, we consider the current contextual word representations (CWR) and contrast with the older distributed representations. We find that traditional measures over word distributions still serve as strong baselines, while higher-order measures with CWR are effective.
翻译:在估计新领域模型的性能方面,差异程度差异在评估新领域模型的绩效方面起着重要作用。虽然有关于差异计量的重要文献,但研究人员发现很难为给定的NLP应用程序选择适当的差异。我们通过调查文献和实证研究来应对这一缺陷。我们开发了包括三类差异计量的分类方法 -- -- 信息理论、几何和高阶措施,并查明了它们之间的关系。此外,为了了解这些措施的共同使用情况,我们承认三种新应用方法:(1)数据选择、(2)学习代表性和(3)野生中的决定 -- -- 并用它来组织我们的文献。我们从这一点中确定信息理论计量措施在1和3中很普遍,以及更高等级措施在2方面更为常见。为了进一步帮助研究人员选择适当的措施来预测绩效下降 -- -- 野生中决定的一个重要方面,我们进行了涉及130个领域适应设想、3项不同NLP任务和从我们调查中查明的12项差异措施的关联性分析。为了计算这些差异,我们考虑到当前的背景文字表述和与旧的分布结构的对比,我们发现传统的词语是强有力的分配尺度。