How should we quantify the inconsistency of a database that violates integrity constraints? Proper measures are important for various tasks, such as progress indication and action prioritization in cleaning systems, and reliability estimation for new datasets. To choose an appropriate inconsistency measure, it is important to identify the desired properties in the application and understand which of these is guaranteed or at least expected in practice. For example, in some use cases the inconsistency should reduce if constraints are eliminated; in others it should be stable and avoid jitters and jumps in reaction to small changes in the database. We embark on a systematic investigation of properties for database inconsistency measures. We investigate a collection of basic measures that have been proposed in the past in both the Knowledge Representation and Database communities, analyze their theoretical properties, and empirically observe their behaviour in an experimental study. We also demonstrate how the framework can lead to new inconsistency measures by introducing a new measure that, in contrast to the rest, satisfies all of the properties we consider and can be computed in polynomial time.
翻译:我们应如何量化违反廉正限制的数据库的不一致之处?适当的措施对于各种任务十分重要,例如清理系统的进展指示和行动优先次序以及新数据集的可靠性估计等。为了选择适当的不一致措施,必须确定应用中的理想属性,并了解其中哪些是有保障的,或在实践中至少是预期的。例如,在某些使用情况下,如果消除了限制,不一致之处应减少;在另一些情况下,这种不一致应保持稳定,避免紧张和因数据库小幅变化而跳跃。我们开始系统调查各种属性,以便采取数据库不一致的措施。我们调查以往在知识代表性和数据库界提出的一套基本措施,分析其理论属性,并在实验研究中以经验方式观察其行为。我们还表明,与休息相比,框架可以采用新的措施,满足我们所考虑并在多时计算的所有属性,从而导致新的不一致措施。