Scientific communities naturally tend to organize around data ecosystems created by the combination of their observational devices, their data repositories, and the workflows essential to carry their research from observation to discovery. However, these legacy data ecosystems are now breaking down under the pressure of the exponential growth in the volume and velocity of these workflows, which are further complicated by the need to integrate the highly data intensive methods of the Artificial Intelligence revolution. Enabling ground breaking science that makes full use of this new, data saturated research environment will require distributed systems that support dramatically improved resource sharing, workflow portability and composability, and data ecosystem convergence. The Cybercosm vision presented in this white paper describes a radically different approach to the architecture of distributed systems for data-intensive science and its application workflows. As opposed to traditional models that restrict interoperability by hiving off storage, networking, and computing resources in separate technology silos, Cybercosm defines a minimally sufficient hypervisor as a spanning layer for its data plane that virtualizes and converges the local resources of the system's nodes in a fully interoperable manner. By building on a common, universal interface into which the problems that infect today's data-intensive workflows can be decomposed and attacked, Cybercosm aims to support scalable, portable and composable workflows that span and merge the distributed data ecosystems that characterize leading edge research communities today.
翻译:科学界自然倾向于围绕由观测装置、数据储存库和将研究从观测到发现所必需的工作流程相结合而创造的数据生态系统组织起来。然而,这些遗留数据生态系统现在正在随着这些工作流程数量和速度的指数增长的压力而崩溃,这些工作流程的数量和速度的指数增长使这些趋势更加复杂,因为需要整合人工智能革命中高度数据密集的方法而使这些模式进一步复杂化。使充分利用这种新的、数据饱和的研究环境的地面破碎科学需要分布式系统,支持大大改进资源共享、工作流程可移动性和可比较性以及数据生态系统汇合。本白皮书提出的网络科愿景描述了对数据密集科学及其应用工作流程分布式系统结构的一种完全不同的方法。与传统的模型相反,这些模型通过将存储、联网和计算资源放在单独的技术库中来限制互操作性,网络科将一个最起码的超强的超光度定义为其数据平流层的覆盖层,该层将系统节点的本地资源虚拟化和融合成一个完全可互操作性的方式。通过构建一个共同的、通用的、通用的、可移动的、可移动的网络化的工作流程,支持当今的、可移动的、可移动的、可移动的工作流程,从而支持一个共同的、可攻击的、可移动的、可移动的、可移动的、可移动的、可移动的、可移动的、可移动的、可移动的、可移动的、可移动的系统。