This paper proposes a uniqueness Shapley measure to compare the extent to which different variables are able to identify a subject. Revealing the value of a variable on subject $t$ shrinks the set of possible subjects that $t$ could be. The extent of the shrinkage depends on which other variables have also been revealed. We use Shapley value to combine all of the reductions in log cardinality due to revealing a variable after some subset of the other variables has been revealed. This uniqueness Shapley measure can be aggregated over subjects where it becomes a weighted sum of conditional entropies. Aggregation over subsets of subjects can address questions like how identifying is age for people of a given zip code. Such aggregates have a corresponding expression in terms of cross entropies. We use uniqueness Shapley to investigate the differential effects of revealing variables from the North Carolina voter registration rolls and in identifying anomalous solar flares. An enormous speedup (approaching 2000 fold in one example) is obtained by using the all dimension trees of Moore and Lee (1998) to store the cardinalities we need.
翻译:本文提出一种独特性 Shapley 度量, 以比较不同变量能够辨别对象的程度。 引用一个变量的值 $t$ 的数值 缩小了一组可能的值 $t 。 收缩的程度取决于其他变量是否也暴露出来。 我们使用 Shapley 值将日志基数的所有减少量结合起来, 以便在其他变量的某个子集暴露出来后披露变量。 这个独特性 沙pely 度量量可以归结于成为一个有条件的寄生虫加权总和的科目。 对子集的聚合可以解决问题, 比如, 确定某个子集的人的年龄是给定的 。 这种集合在交叉种中具有相应的表达方式。 我们使用独有性 Shapley 来调查从北卡罗莱选民登记卷中披露变量的不同影响, 并找出异常性太阳耀斑。 通过使用摩尔 和 Lee (1998年) 的所有维度树储存我们需要的基点, 获得了巨大的加速性( ) 。