Multi-relational databases are the basis of most consolidated data collections in science and industry today. Most learning and mining algorithms, however, require data to be represented in a propositional form. While there is a variety of specialized machine learning algorithms that can operate directly on multi-relational data sets, propositionalization algorithms transform multi-relational databases into propositional data sets, thereby allowing the application of traditional machine learning and data mining algorithms without their modification. One prominent propositionalization algorithm is RELAGGS by Krogel and Wrobel, which transforms the data by nested aggregations. We propose a new neural network based algorithm in the spirit of RELAGGS that employs trainable composite aggregate functions instead of the static aggregate functions used in the original approach. In this way, we can jointly train the propositionalization with the prediction model, or, alternatively, use the learned aggegrations as embeddings in other algorithms. We demonstrate the increased predictive performance by comparing N-RELAGGS with RELAGGS and multiple other state-of-the-art algorithms.
翻译:多种关系数据库是当今科学和工业中大多数综合数据收集的基础。然而,大多数学习和采矿算法都要求以假设形式代表数据。虽然有各种专门的机器学习算法可以直接在多关系数据集上运作,但提议式算法可以将多关系数据库转化为假设式数据集,从而允许应用传统的机器学习和数据挖掘算法而无需修改。一个突出的建议性算法是Krogel和Wrobel的RELAGGGS,它通过嵌巢式集成转换数据。我们提议一种新的基于神经网络算法,以REALGGSS的精神为基础,使用可训练的综合综合功能,而不是最初方法中使用的静态综合功能。这样,我们可以联合用预测模型来培训提议,或者将所学的分类作为嵌入其他算法的嵌入。我们通过将N-REGGGS与RELAGGGS和多种其他状态的算法进行比较,来证明预测性提高的性能。