Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata (e.g. the license or training emissions). We document the ecosystem extensively at https://crfm.stanford.edu/ecosystem-graphs/. As of March 16, 2023, we annotate 262 assets (64 datasets, 128 models, 70 applications) from 63 organizations linked by 356 dependencies. We show Ecosystem Graphs functions as a powerful abstraction and interface for achieving the minimum transparency required to address myriad use cases. Therefore, we envision Ecosystem Graphs will be a community-maintained resource that provides value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers.
翻译:基础模型(例如 ChatGPT、 StableDiffusion)广泛影响社会,因此需要立即引起社会关注。虽然这些模型本身引起了许多关注,但为了准确地刻画它们的影响,我们必须考虑更广泛的社会技术生态系统。我们提出了“生态系统图”作为一个文档框架,以透明地集中对该生态系统的知识。生态系统图由“资产”(数据集、模型、应用程序)构成,这些资产之间通过依赖关系链接在一起,这些依赖关系指示了技术(例如 Bing 如何依赖 GPT-4)和社交(例如微软如何依赖 OpenAI)关系。为了补充图形结构,每个资产都进一步丰富了细粒度元数据(例如许可证或培训排放)。我们在 https://crfm.stanford.edu/ecosystem-graphs/ 上广泛记录生态系统。截至2023年3月16日,我们注释了来自63个组织的262个资产(64个数据集、128个模型、70个应用程序),这些资产通过356个依赖关系链接在一起。我们展示了生态系统图作为一个强大的抽象和接口,可以实现解决无数用例所需的最小透明度。因此,我们预计生态系统图将是一个由社区维护的资源,为涉及人工智能研究人员、行业专家、社会科学家、审计员和政策制定者的利益相关方提供价值。