We study the problem of computing an embedding of the tuples of a relational database in a manner that is extensible to dynamic changes of the database. Importantly, the embedding of existing tuples should not change due to the embedding of newly inserted tuples (as database applications might rely on existing embeddings), while the embedding of all tuples, old and new, should retain high quality. This task is challenging since state-of-the-art embedding techniques for structured data, such as (adaptations of) embeddings on graphs, have inherent inter-dependencies among the embeddings of different entities. We present the FoRWaRD algorithm (Foreign Key Random Walk Embeddings for Relational Databases) that draws from embedding techniques for general graphs and knowledge graphs, and is inherently utilizing the schema and its key and foreign-key constraints. We compare FoRWaRD to an alternative approach that we devise by adapting node embeddings for graphs (Node2Vec) to dynamic databases. We show that FoRWaRD is comparable and sometimes superior to state-of-the-art embeddings in the static (traditional) setting, using a collection of downstream tasks of column prediction over geographical and biological domains. More importantly, in the dynamic setting FoRWaRD outperforms the alternatives consistently and often considerably, and features only a mild reduction of quality even when the database consists of mostly newly inserted tuples.
翻译:我们的研究问题是,如何以能够容纳数据库动态变化的方式,将关系数据库的图象嵌入嵌入。重要的是,现有图象的嵌入不应因嵌入新插入图象(因为数据库应用程序可能依靠现有的嵌入)而改变,而所有新旧图象的嵌入应保持高质量。这项任务具有挑战性,因为最先进的结构化数据嵌入技术,如图表中的(改制)嵌入,在不同实体的嵌入中具有内在的相互依存关系。我们介绍了FORWARAD算法(FORWARD算法(FORWARD 随机行行嵌入数据库,用于关系数据库),该算自嵌入通用图和知识图的嵌入技术,而内在地使用Schemeta及其关键和外国关键制约。我们把FORWARD比作一种替代方法,我们通过将图表(Node2Vec)的节点嵌嵌入质量调整为动态数据库。我们显示,FORWARD通常在稳定、有时高水平的地理范围上,在稳定、高水平上,只是将常规和高水平的RRD的流流流化领域中,只是将生物流流流化和高地建建建建在比。