We present module theory and linear maps as a powerful generalised and computationally efficient framework for the relational data model, which underpins today's relational database systems. Based on universal constructions of modules we obtain compact and computationally efficient data structures for data collections corresponding to union and deletion, repeated union, Cartesian product and key-indexed data. Free modules naturally give rise to polysets, which generalise multisets and facilitate expressing database queries as multilinear maps with asymptotically efficient evaluation on polyset constructors. We introduce compact maps as a way of representing infinite (poly)sets constructible from an infinite base set and its elements by addition and subtraction. We show how natural joins generalise to algebraic joins, while intersection is implemented by a novel algorithm on nested compact maps that carefully avoids visiting parts of the input that do not contribute to the eventual output. Our algebraic framework leads to a worst-case optimal evaluation of cyclic relational queries, which is known to be impossible using textbook query optimisers that operate on lists of records only.
翻译:我们提出模块理论和线性地图,作为关系数据模型的强大通用和计算效率框架,作为当今关系数据库系统的基础。基于模块的通用构造,我们获得与合并和删除、重复结合、笛卡尔产品和关键索引数据相应的集成数据集的压缩和计算效率数据结构。自由模块自然产生聚集,这些聚集概括多集,便于以多线性地图的形式表达数据库查询,同时对聚体构造器进行无症状有效的评估。我们引入紧凑地图,作为代表无限基数及其元素的无限(多)设置和增减。我们展示了如何自然结合通缩与代数结合,而交叉则由嵌入式紧凑的地图上的新算法实施,该算法谨慎地避免访问不会对最终产出作出贡献的部分输入。我们的代数框架导致对循环关系查询进行最差的优化评价,据知不可能使用仅对记录列表操作的教科书选取方法。