The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Several studies have suggested that contextualized word embedding models do not isotropically project tokens into vector space. However, current methods designed to measure isotropy, such as average random cosine similarity and the partition score, have not been thoroughly analyzed and are not appropriate for measuring isotropy. We propose IsoScore: a novel tool that quantifies the degree to which a point cloud uniformly utilizes the ambient vector space. Using rigorously designed tests, we demonstrate that IsoScore is the only tool available in the literature that accurately measures how uniformly distributed variance is across dimensions in vector space. Additionally, we use IsoScore to challenge a number of recent conclusions in the NLP literature that have been derived using brittle metrics of isotropy. We caution future studies from using existing tools to measure isotropy in contextualized embedding space as resulting conclusions will be misleading or altogether inaccurate.
翻译:最近传播的单词表达方式的成功导致人们对分析其空间分布特性的兴趣增加。 一些研究表明,背景化的字嵌入模型并不在矢量空间中进行非随机工程符号。 然而,目前设计用来测量异质性的方法,如平均随机共振相近性和分区评分,还没有经过彻底分析,而且不适合测量异质。 我们建议IsoScore:这是一个新工具,可以量化点云统一利用环境矢量空间的程度。 我们使用严格设计的测试,表明IsoScore是文献中唯一可以准确测量矢量空间各维度之间是否分布一致的工具。 此外,我们使用IsoScore来质疑NLP文献中最近使用微量度得出的一些结论。 我们告诫今后的研究不要使用现有工具测量背景化嵌入空间中的异性,因为由此得出的结论将是误导或完全不准确的。