Data mesh is an emerging domain-driven decentralized data architecture that aims to minimize or avoid operational bottlenecks associated with centralized, monolithic data architectures in enterprises. The topic has picked the practitioners' interest, and there is considerable gray literature on it. At the same time, we observe a lack of academic attempts at defining and building upon the concept. Hence, in this article, we aim to start from the foundations and characterize the data mesh architecture regarding its design principles, architectural components, capabilities, and organizational roles. We systematically collected, analyzed, and synthesized 114 industrial gray literature articles. The review provides insights into practitioners' perspectives on the four key principles of data mesh: data as a product, domain ownership of data, self-serve data platform, and federated computational governance. Moreover, due to the comparability of data mesh and SOA (service-oriented architecture), we mapped the findings from the gray literature into the reference architectures from the SOA academic literature to create the reference architectures for describing three key dimensions of data mesh: organization of capabilities and roles, development, and runtime. Finally, we discuss open research issues in data mesh, partially based on the findings from the gray literature.
翻译:数据网格是一种新兴的面向领域的分散式数据架构,旨在最小化或避免企业集中的、单片的数据架构所带来的操作瓶颈。该主题引起了从业者的广泛兴趣,并且有相当多的灰色文献。同时,我们观察到缺乏学术上对该概念进行定义和深入研究的尝试。因此,在本文中,我们旨在从基础开始,对数据网格架构进行表征,包括其设计原则,架构组件,能力和组织角色。我们系统收集、分析、综合了114篇工业灰色文献文章,以对数据网格四个核心原则(数据作为产品,数据领域的所有权,自助数据平台和联邦计算治理)的从业者视角提供洞见。此外,由于数据网格与SOA(面向服务的架构)具有可比性,我们将灰色文献的发现与SOA学术文献中的参考架构进行了映射,创建了描述数据网格三个关键维度(能力和角色的组织,开发和运行时)的参考架构。最后,我们根据灰色文献的发现,讨论了数据网格中的开放性研究课题。