Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction.
翻译:功能依赖性(FD)和多值依赖性(MVD)等完整性限制,如功能依赖性(FD)和多值依赖性(MVD)等完整性限制,是数据库系统设计的基础。同样,概率性有条件独立(CI)对于多变概率分布的推理至关重要。 隐含的问题研究是,一组约束(Nevidence)是否意味着另一个制约(后果),并在数据库和AI文献中进行了调查,假设所有制约都确实存在。然而,许多应用软件今天都认为制约只是大致的。 在本文中,我们定义了一种大概的含意,即隐含的含义是,在前代的满意度和后代的满意度之间是线性不平等。 我们研究的是:当一个直线性隐含的含意值在前代的满意度和后代之间,我们研究的放松问题:当下个因素可以降低到后代的系数为1.;当我们研究的是,当一个经常性问题时,当一个因素可以降低到后, 当一个因素时,我们考虑经常性问题时, 当一个因素在后,我们考虑第二, 内的制约会显示,我们总是会显示,一个隐含着一个隐含。