Data Analytics provides core business reporting needs in many software companies, acts as a source of truth for key information, and enables building advanced solutions, e.g., predictive models, machine learning, real-time recommendations, to grow the business. A self-service, multi-tenant, API-first, and scalable data platform is the foundational requirement in creating an enterprise data marketplace, which enables the creation, publishing, and exchange of data products. Such a marketplace enables the exploration and discovery of data products, further providing high-level data governance and oversight on marketplace contents. In this paper, we describe our way to the gourmet data product marketplace. We cover the design principles, the implementation details, technology choices, and the journey to build an enterprise data platform that meets the above characteristics. The platform consists of ingestion, streaming, storage, transformation, schema generation, fail-safe, data sharing, access management, PII data automatic identification, self-service storage optimization recommendations, and CI/CD integration. We then show how the platform enables and operates the data marketplace, facilitating the exchange of stable data products across users and tenants. We motivate and show how we run scalable decentralized data governance. All of this is built and run for Cimpress Technology (CT), which operates the Mass Customization Platform for Cimpress and its businesses. The CT data platform serves 1000s of users from different platform participants, with data sourced from heterogeneous sources. Data is ingested at a rate of well over 1000 individual messages per second and serves more than 100k analytical queries daily.
翻译:数据分析分析是许多软件公司的核心业务报告需求,是关键信息的一个真相来源,有助于建立先进的解决方案,例如预测模型、机器学习、实时建议等,以发展企业。自我服务、多保存、IPI第一和可扩展的数据平台是创建企业数据市场的基本要求,使数据产品的创建、出版和交换成为可能。这样的市场能够探索和发现数据产品,进一步提供市场内容的高层次数据治理和监督。在本文件中,我们描述了我们通往谷美数据产品市场的道路。我们涵盖了设计原则、实施细节、技术选择以及建设符合上述特点的企业数据平台的旅程。平台包括摄取、流流、存储、转换、化学生成、故障安全、数据共享、访问管理、PII数据自动识别、自我服务存储优化建议以及CI/CD整合。我们随后展示了平台是如何促进和运行数据市场的,便利了用户和租户之间稳定的数据产品交换、技术选择以及建立符合上述特点的企业数据平台的旅程。我们为来自不同用户和租户的客户建立并运行了更分散的数据平台。我们为每个客户建立和运行了更分散的数据平台。