Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. It consists in automatically learning a logical theory from data. The basis for structure learning is mining repeating patterns in the data, known as structural motifs. Finding these patterns reduces the exponential search space and therefore guides the learning of formulas. Despite the importance of motif learning, it is still not well understood. We present the first principled approach for mining structural motifs in lifted graphical models, languages that blend first-order logic with probabilistic models, which uses a stochastic process to measure the similarity of entities in the data. Our first contribution is an algorithm, which depends on two intuitive hyperparameters: one controlling the uncertainty in the entity similarity measure, and one controlling the softness of the resulting rules. Our second contribution is a preprocessing step where we perform hierarchical clustering on the data to reduce the search space to the most relevant data. Our third contribution is to introduce an O(n ln n) (in the size of the entities in the data) algorithm for clustering structurally-related data. We evaluate our approach using standard benchmarks and show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.
翻译:结构学习是AI的核心问题,是神经 -- -- ylmbolic AI和统计关系学习领域的核心。 它包含自动从数据中学习逻辑理论。 结构学习的基础是挖掘数据中的重复模式, 称为结构图示。 找到这些模式会减少指数搜索空间, 从而指导公式的学习。 尽管“ motif” 学习很重要, 但仍然不很理解。 我们提出了第一个原则性方法, 用于在解开的图形模型中挖掘结构图案, 语言将一阶逻辑与概率模型相结合, 使用随机程序来测量数据中实体的相似性。 我们的第一个贡献是算法, 它取决于两个直观的超参数: 一种控制实体相似度测量中的不确定性, 一种控制由此产生的规则的软性。 我们的第二个贡献是一个预处理步骤, 在那里对数据进行分级组合, 将搜索空间减少到最相关的数据。 我们的第三个贡献是引入一个O (n) (以数据中实体的大小为单位) 来测量数据中的相近。 我们的第一个贡献是算算法, 取决于结构结构结构结构结构结构学80 的精确性基准, 我们用标准来评估我们用80 向标准学学 。