Concept learning deals with learning description logic concepts from a background knowledge and input examples. The goal is to learn a concept that covers all positive examples, while not covering any negative examples. This non-trivial task is often formulated as a search problem within an infinite quasi-ordered concept space. Although state-of-the-art models have been successfully applied to tackle this problem, their large-scale applications have been severely hindered due to their excessive exploration incurring impractical runtimes. Here, we propose a remedy for this limitation. We reformulate the learning problem as a multi-label classification problem and propose a neural embedding model (NERO) that learns permutation-invariant embeddings for sets of examples tailored towards predicting $F_1$ scores of pre-selected description logic concepts. By ranking such concepts in descending order of predicted scores, a possible goal concept can be detected within few retrieval operations, i.e., no excessive exploration. Importantly, top-ranked concepts can be used to start the search procedure of state-of-the-art symbolic models in multiple advantageous regions of a concept space, rather than starting it in the most general concept $\top$. Our experiments on 5 benchmark datasets with 770 learning problems firmly suggest that NERO significantly (p-value <1%) outperforms the state-of-the-art models in terms of $F_1$ score, the number of explored concepts, and the total runtime. We provide an open-source implementation of our approach.
翻译:概念学习涉及从背景知识和投入实例中学习描述逻辑概念的概念。 目标是学习一个包含所有正面实例的概念, 但不包含任何负面实例。 这种非三重任务往往在无限的准顺序概念空间里被设计成一个搜索问题。 虽然已经成功地应用了最先进的模型来解决这一问题, 但由于过度探索, 导致不切实际运行时间, 它们的大规模应用严重受阻。 这里, 我们对这一限制提出一个补救措施 。 我们将学习问题重新描述为一个多标签分类问题, 并提议一个神经嵌入模型( NERO), 以学习用于预测$_ 1美元预选描述逻辑概念的成套实例的变异嵌式。 通过将这类概念排列在预测分数的降序中, 可以在少数的检索操作中发现一个可能的目标概念, 也就是说, 没有过度探索。 重要的是, 最高级级的概念可以用来在多个有利的空间区域中开始搜索最高级的符号模型( NEROOOO) 的搜索程序, 而不是在最精确的模型中开始我们最高级的基数 5美元 的实验中 。</s>