Aggregating data in a database could also be called "integrating along fibers": given functions $\pi\colon E\to D$ and $s\colon E\to R$, where $(R,\circledast)$ is a commutative monoid, we want a new function $(\circledast s)_\pi$ that sends each $d\in D$ to the "sum" of all $s(e)$ for which $\pi(e)=d$. The operation lives alongside querying -- or more generally data migration -- in typical database usage: one wants to know how much Canadians spent on cell phones in 2021, for example, and such requests typically require both aggregation and querying. But whereas querying has an elegant category-theoretic treatment in terms of parametric right adjoints between copresheaf categories, a categorical formulation of aggregation -- especially one that lives alongside that for querying -- appears to be completely absent from the literature. In this paper we show how both querying and aggregation fit into the "polynomial ecosystem". Starting with the category $\mathbf{Poly}$ of polynomial functors in one variable, we review the relatively recent results of Ahman-Uustalu and Garner, which showed that the framed bicategory $\mathbb{C}\mathbf{at}^\sharp$ of comonads in $\mathbf{Poly}$ is precisely the right setting for data migration: its objects are categories and its bicomodules are parametric right adjoints between their copresheaf categories. We then develop a great deal of theory, compressed for space reasons, including local monoidal closed structures, a coclosure to bicomodule composition, and an understanding of adjoints in $\mathbb{C}\mathbf{at}^\sharp$. Doing so allows us to derive interesting mathematical results, e.g.\ that the ordinary operation of transposing a span can be decomposed into the composite of two more primitive operations, and then finally to explain how aggregation arises, alongside querying, in $\mathbb{C}\mathbf{at}^\sharp$.
翻译:函子聚合
Translated abstract:
聚合数据库中的数据也可以称为“沿纤维积分”:给定函数 $\pi \colon E \to D$ 和 $s \colon E \to R$,其中 $(R, \circledast)$ 是一个可交换幺半群,我们想要得到一个新函数 $(\circledast s)_\pi$,它将每个 $d \in D$ 发送到所有满足 $\pi(e)=d$ 的 $s(e)$ 的“和”。该操作与查询(或更一般地说是数据迁移)一起出现在典型的数据库使用中:例如,人们想知道2021年加拿大人在手机上花费了多少钱,这样的请求通常需要聚合和查询。但是,虽然查询在参数右伴随的幺补范畴之间具有优雅的范畴论处理方式,但聚合的范畴式表述——尤其是与查询同时存在的——似乎在文献中完全不存在。在本文中,我们展示了查询和聚合如何适合于“多项式生态系统”。从一元多项式函子的范畴 $\mathbf{Poly}$ 开始,我们回顾了 Ahman-Uustalu 和 Garner 的相对较新的结果,他们指出了在 $\mathbf{Poly}$ 中的共单子的框架化双范畴 $\mathbb{C}\mathbf{at}^\sharp$ 正是数据迁移的正确设置:它的对象是范畴,它的双共模是它们的余层范畴之间的参数右伴随。然后,出于篇幅原因,我们发展了大量理论,包括局部幺闭结构、用于二共模组合的余闭包以及 $\mathbb{C}\mathbf{at}^\sharp$ 中伴随的理解。这样做允许我们推导有趣的数学结果,例如常规的跨跨操作可以分解为两个更原始操作的组合,最后解释聚合如何与查询一起出现在 $\mathbb{C}\mathbf{at}^\sharp$ 中。