Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the development of increasingly advanced AI systems, including degradation of the digital commons and unemployment from automation. The rapid pace of AI development and deployment currently leaves little room for this power. Monopolized in the hands of private corporations, the development of the most capable foundation models has proceeded largely without public input. There is currently no implemented mechanism for ensuring that the economic value generated by such models is redistributed to account for their negative externalities. The citizens that have generated the data necessary to train models do not have input on how their data are to be used. In this work, we propose that a public data trust assert control over training data for foundation models. In particular, this trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment. First, we argue in detail for the existence of such a trust. We also discuss feasibility and potential risks. Second, we detail a number of ways for a data trust to incentivize model developers to use training data only from the trust. We propose a mix of verification mechanisms, potential regulatory action, and positive incentives. We conclude by highlighting other potential benefits of our proposed data trust and connecting our work to ongoing efforts in data and compute governance.
翻译:AI的民主化不仅意味着人们可以自由地使用AI,而且意味着人们可以集体决定如何使用AI。特别是,需要集体决策权来纠正发展日益先进的AI系统所产生的负面外部效应,包括数字公域退化和自动化造成的失业。AI的快速发展和部署目前没有多少余地。私人公司手中的垄断,最有能力的基础模型的开发基本上在没有公共投入的情况下进行。目前没有执行的机制来确保这种模型产生的经济价值被重新分配,以说明其负面的外部效应。特别是,已经产生培训模型所需的数据的公民,对于如何使用其数据没有投入。在这项工作中,我们建议公共数据信托对基础模型的培训数据进行控制。特别是,这种信任应该废除互联网作为数字公域的垄断,以便允许商业模型开发者从部署中削减一定比例的收入。首先,我们详细主张这种信任的存在。我们还讨论可行性和潜在风险。第二,我们详细建议了一些将数据信任与潜在数据开发者联系起来的方法。我们建议,我们通过建立潜在的数据认证机制,我们建议通过建立积极的数据认证机制,我们建议通过建立潜在的数据认证机制,我们建议了一些连接数据认证工作。</s>