We introduce and study knowledge drift (KD), a complex form of drift that occurs in hierarchical classification. Under KD the vocabulary of concepts, their individual distributions, and the is-a relations between them can all change over time. The main challenge is that, since the ground-truth concept hierarchy is unobserved, it is hard to tell apart different forms of KD. For instance, introducing a new is-a relation between two concepts might be confused with individual changes to those concepts, but it is far from equivalent. Failure to identify the right kind of KD compromises the concept hierarchy used by the classifier, leading to systematic prediction errors. Our key observation is that in many human-in-the-loop applications (like smart personal assistants) the user knows whether and what kind of drift occurred recently. Motivated by this, we introduce TRCKD, a novel approach that combines automated drift detection and adaptation with an interactive stage in which the user is asked to disambiguate between different kinds of KD. In addition, TRCKD implements a simple but effective knowledge-aware adaptation strategy. Our simulations show that often a handful of queries to the user are enough to substantially improve prediction performance on both synthetic and realistic data.
翻译:我们引入并研究知识漂移(KD),这是在等级分类中发生的一种复杂的漂移形式。在 KD 下,概念的词汇、个人分布和它们之间的关系都会随着时间的变化而变化。主要的挑战在于,由于地面真相概念等级没有被观察,很难区分不同的KD形式。例如,引入一种新的是两种概念之间的关系,可能与这些概念的个别变化相混淆,但远不相等。未能确定正确的KD类型会损害分类员使用的概念等级,导致系统预测错误。我们的主要观察是,在许多流动的人(如智能个人助理)应用中,用户知道最近是否和发生了什么漂移。我们为此,我们引入了TRCKD,这是一种新颖的方法,将自动漂移探测和适应与互动阶段结合起来,要求用户在不同类型KD之间脱钩。此外,TRCKD还实施了简单有效的知识认知适应战略。我们的模拟表明,对于用户的精确的询问往往十分符合现实,既能大大改进合成数据的预测,也表明对用户的精确性能作了充分的改进。