Acyclic schemas possess known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic lower bound on the percentage of redundant tuples. For an upper bound, we propose a \e{random database model}, and establish a bound that holds in expectation over a random choice of relation, which coincides with the lower bound for large relation instances.
翻译:Acyclic schemas拥有数据库设计、加速查询和减少空间要求方面的已知利益。 环形合并依赖( AJD) 与一种普遍关系无损, 如果加入与系统相关的预测会导致原始普遍关系。 AJD 引起的损失的直观和标准衡量尺度是环形合并产生的多余图例数量。 最近的工作表明, 丧失一个AJD 也可以以信息理论衡量尺度为特征。 受自动将一个环形组合模型与普遍关系相适应问题的影响, 我们调查这两种损失特征之间的联系。 我们首先显示, 使用 KL- Divergence 的概念来捕获一个AJD 的损失。 然后我们表明, KL- 参数可以用来约束冗余图例数量。 我们证明, 在冗余图案的百分比上, 我们提出一个上限, 我们提议一个 & erandomidom 数据库模型, 并且确定一个限制范围, 与任意选择的更低比例的关系。