We study changes in metrics that are defined on a cartesian product of trees. Such metrics occur naturally in many practical applications, where a global metric (such as revenue) can be broken down along several hierarchical dimensions (such as location, gender, etc). Given a change in such a metric, our goal is to identify a small set of non-overlapping data segments that account for the change. An organization interested in improving the metric can then focus their attention on these data segments. Our key contribution is an algorithm that mimics the operation of a hierarchical organization of analysts. The algorithm has been successfully applied, for example within Google Adwords to help advertisers triage the performance of their advertising campaigns. We show that the algorithm is optimal for two dimensions, and has an approximation ratio $\log^{d-2}(n+1)$ for $d \geq 3$ dimensions, where $n$ is the number of input data segments. For the Adwords application, we can show that our algorithm is in fact a $2$-approximation. Mathematically, we identify a certain data pattern called a \emph{conflict} that both guides the design of the algorithm, and plays a central role in the hardness results. We use these conflicts to both derive a lower bound of $1.144^{d-2}$ (again $d\geq3$) for our algorithm, and to show that the problem is NP-hard, justifying the focus on approximation.
翻译:我们研究在树的产纸箱产品上定义的量度的变化。这种量度自然在许多实际应用中出现,例如收入等,可以按照几个等级层面(例如地点、性别等)细分全球量度(例如收入),从几个等级层面(例如地点、性别等)分解。鉴于这种量度的变化,我们的目标是确定一组小的不重叠数据部分,用于计算变化。一个有兴趣改进量度的组织然后将注意力集中在这些数据部分上。我们的主要贡献是模拟分析师等级组织的运作的算法。算法已经成功地应用,例如,在谷歌词中应用了算法,以帮助广告商减少其广告活动的业绩。我们显示算法对于两个层面来说是最佳的,并且有美元=+1美元对美元3美元值的近似比率,其中美元是输入数据部分的数量。关于Adword的应用程序,我们可以证明我们的算法事实上是2美元乘法。从数学角度,我们找到了一种数据模式,叫做美元\NP3 和NP2 竞选运动的性分析结果。我们用一个硬性算法来显示一个硬性分析的结果。