恢复数字公域:培训数据公共数据信托基金</s> (Reclaiming the Digital Commons: A Public Data Trust for Training Data)

Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the development of increasingly advanced AI systems, including degradation of the digital commons and unemployment from automation. The rapid pace of AI development and deployment currently leaves little room for this power. Monopolized in the hands of private corporations, the development of the most capable foundation models has proceeded largely without public input. There is currently no implemented mechanism for ensuring that the economic value generated by such models is redistributed to account for their negative externalities. The citizens that have generated the data necessary to train models do not have input on how their data are to be used. In this work, we propose that a public data trust assert control over training data for foundation models. In particular, this trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment. First, we argue in detail for the existence of such a trust. We also discuss feasibility and potential risks. Second, we detail a number of ways for a data trust to incentivize model developers to use training data only from the trust. We propose a mix of verification mechanisms, potential regulatory action, and positive incentives. We conclude by highlighting other potential benefits of our proposed data trust and connecting our work to ongoing efforts in data and compute governance.

翻译：AI的民主化不仅意味着人们可以自由地使用AI,而且意味着人们可以集体决定如何使用AI。特别是,需要集体决策权来纠正发展日益先进的AI系统所产生的负面外部效应,包括数字公域退化和自动化造成的失业。AI的快速发展和部署目前没有多少余地。私人公司手中的垄断,最有能力的基础模型的开发基本上在没有公共投入的情况下进行。目前没有执行的机制来确保这种模型产生的经济价值被重新分配,以说明其负面的外部效应。特别是,已经产生培训模型所需的数据的公民,对于如何使用其数据没有投入。在这项工作中,我们建议公共数据信托对基础模型的培训数据进行控制。特别是,这种信任应该废除互联网作为数字公域的垄断,以便允许商业模型开发者从部署中削减一定比例的收入。首先,我们详细主张这种信任的存在。我们还讨论可行性和潜在风险。第二,我们详细建议了一些将数据信任与潜在数据开发者联系起来的方法。我们建议,我们通过建立潜在的数据认证机制,我们建议通过建立积极的数据认证机制,我们建议通过建立潜在的数据认证机制,我们建议了一些连接数据认证工作。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/