The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.
翻译:向以数据为中心的AI的过渡要求从数学和执行角度重新审视数据概念,以获得统一的以数据为中心的机器学习包。为此,这项工作提出了由数据绝对和连锁概念提出的统一原则,并讨论了这些原则在以数据为中心的AI转型中的重要性。在绝对概念中,数据被视为一种数学结构,我们通过形态学来保持这一结构。就连锁概念而言,数据可被视为在离散的感兴趣领域界定的功能,并通过操作者采取行动。虽然这些概念几乎是垂直的,但它们提供了一个统一的定义来查看数据,最终影响到从业者开发、执行和使用机器学习包的方式。