Knowledge Bases (KBs) provide structured representation of the real-world in the form of extensive collections of facts about real-world entities, their properties and relationships. They are ubiquitous in large-scale intelligent systems that exploit structured information such as in tasks like structured search, question answering and reasoning, and hence their data quality becomes paramount. The inevitability of change in the real-world, brings us to a central property of KBs -- they are highly dynamic in that the information they contain are constantly subject to change. In other words, KBs are unstable. In this paper, we investigate the notion of KB stability, specifically, the problem of KBs changing due to real-world change. Some entity-property-pairs do not undergo change in reality anymore (e.g., Einstein-children or Tesla-founders), while others might well change in the future (e.g., Tesla-board member or Ronaldo-occupation as of 2022). This notion of real-world grounded change is different from other changes that affect the data only, notably correction and delayed insertion, which have received attention in data cleaning, vandalism detection, and completeness estimation already. To analyze KB stability, we proceed in three steps. (1) We present heuristics to delineate changes due to world evolution from delayed completions and corrections, and use these to study the real-world evolution behaviour of diverse Wikidata domains, finding a high skew in terms of properties. (2) We evaluate heuristics to identify entities and properties likely to not change due to real-world change, and filter inherently stable entities and properties. (3) We evaluate the possibility of predicting stability post-hoc, specifically predicting change in a property of an entity, finding that this is possible with up to 83% F1 score, on a balanced binary stability prediction task.
翻译:知识基础 (KBs) 以广泛收集真实世界实体、其属性和关系等事实的形式,以结构化地代表真实世界。 在大规模智能系统中,它们无处不在,利用结构化搜索、问题回答和推理等任务中的结构化信息,因此其数据质量变得至关重要。在现实世界中,变化的必然性能,使我们进入了KBs的核心属性。它们具有高度的动态性能,它们所包含的信息经常变化。换句话说,KBs是不稳定的。在本文中,我们研究了KB稳定性的概念,具体地说,由于现实世界的变化,KBs 问题。一些实体-财产-财产-儿童或Tesla-Founders,因此不再在现实中发生改变,而另一些则可能在未来发生改变(例如,Tesla-board 成员或Ronaldo-saldo) 。在现实世界中可能发生的变化与影响数据的变化不同, 特别是纠正和延迟插入的问题。 某些实体- 稳定- 已经分析到现在的稳定- Stalvical Studyal 实体-destrual 。