An approach is proposed to quantify, in bits of information, the actual relevance of analogies in analogy tests. The main component of this approach is a softaccuracy estimator that also yields entropy estimates with compensated biases. Experimental results obtained with pre-trained GloVe 300-D vectors and two public analogy test sets show that proximity hints are much more relevant than analogies in analogy tests, from an information content perspective. Accordingly, a simple word embedding model is used to predict that analogies carry about one bit of information, which is experimentally corroborated.
翻译:提议采用一种方法,用信息位数量化模拟在类比测试中的实际相关性。这一方法的主要组成部分是软准确性估算器,该计算器还得出具有补偿性偏差的倍增估计值。预先培训的GloVe 300-D矢量和两个公开类比测试组的实验结果显示,从信息内容角度,近距离提示比类比测试的类比值要重要得多。因此,使用一个简单的嵌入字模型来预测模拟含有大约一小段信息,并经过实验证实。