Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.
翻译:功能依赖性(FD)和多价依赖性(MVD)等完整性限制,如功能依赖性(FD)和多值依赖性(MVD)等功能性完整性限制,是数据库模型设计的基础。同样,概率性有条件独立(CI)对于多变概率分布的推理至关重要。 隐含的问题研究:一组约束(承诺)是否意味着另一个制约(后果)是否意味着另一个制约(后果),在数据库和AI文献中都曾调查过,假设所有限制都完全有效。然而,许多应用软件今天都考虑限制,这些限制只维持在一定的程度上。在本文中,我们将大致的暗示定义为在数据库模型的满意度和随后的满意度之间的线性程度之间的线性不平等。在本文中,我们定义了一种近似隐约的暗示,在数据库前的满意度和随后的满意度的满意度之间的程度之间的线性不平等程度,我们研究的是,我们以前的问题:当一个准确度和后期的衡量结果是,当我们的恢复后,当我们的一个因素可以降低为1.;一个因素是:当我们开始后,当我们开始时,当我们的一个文件的后,当我们的一个因素可以降低为1时,当我们的一个因素,当我们的一个因素,当我们的一个因素可以降低到1时,这个因素可以证明一个后,我们证明一个后,一个后,我们证明一个在第二个,一个在第二个,我们先,我们证明的时候,我们证明一个在第二个,一个后,我们证明,一个在第二个,我们先的,我们证明一个后,我们证明一个在第二个,一个后,我们证明,一个在第二个,我们先的,我们先先的,我们先的,我们证明的,我们证明的里的里的里的,我们证明,我们的一个是的,我们的一个是的,我们的一个,我们的一个,我们先的,一个的,一个的,我们证明,我们的一个,我们是,我们的一个,我们的一个,我们的,我们的,我们的,我们是,我们是,我们是,我们是,一个的,我们的,一个的,我们证明,一个的,我们的,我们的,我们的,我们的,我们的,一个是,一个的,一个的,一个,一个,一个,一个,一个,一个,我们的,我们的,我们的