This paper studies the problem of mining for data values with high information gain in relational tables. High information gain can help data analysts and secondary data mining algorithms gain insights into strong statistical dependencies and causality relationship between key metrics. In this paper, we will study the problem of high information gain identification for scenarios involving temporal relations where new records are added continuously to the relations. We show that information gain can be efficiently maintained in an incremental fashion, making it possible to monitor continuously high information gain values.
翻译:本文研究在相关表格中为数据值开采高信息收益的数据问题; 高信息收益可帮助数据分析员和二级数据开采算法深入了解关键指标之间强有力的统计依赖性和因果关系; 在本文件中,我们将研究高信息收益识别问题,以了解在关系中不断增加新记录的时间关系情景; 我们表明,信息收益可以以渐进方式有效保持,从而有可能监测信息收益的持续高值。