Democratization of AI involves training and deploying machine learning models across heterogeneous and potentially massive environments. Diversity of data opens up a number of possibilities to advance AI systems, but also introduces pressing concerns such as privacy, security, and equity that require special attention. This work shows that it is theoretically impossible to design a rational learning algorithm that has the ability to successfully learn across heterogeneous environments, which we decoratively call collective intelligence (CI). By representing learning algorithms as choice correspondences over a hypothesis space, we are able to axiomatize them with essential properties. Unfortunately, the only feasible algorithm compatible with all of the axioms is the standard empirical risk minimization (ERM) which learns arbitrarily from a single environment. Our impossibility result reveals informational incomparability between environments as one of the foremost obstacles for researchers who design novel algorithms that learn from multiple environments, which sheds light on prerequisites for success in critical areas of machine learning such as out-of-distribution generalization, federated learning, algorithmic fairness, and multi-modal learning.
翻译:AI的民主化涉及培训和部署跨不同和潜在大规模环境的机器学习模式。数据的多样性为推进AI系统开辟了多种可能性,但也提出了隐私、安全和公平等需要特别关注的紧迫问题。 这项工作表明,理论上不可能设计一种理性的学习算法,这种算法能够成功地在不同的环境中学习,我们把这种算法称为集体智能(CI)。 通过将学习算法作为相对于假设空间的选择通信,我们得以将它们与基本特性进行分解。 不幸的是,唯一符合所有常理的可行算法是标准的经验风险最小化(ERM ), 它从一个环境中任意学习。 我们不可能的结果显示,在设计从多种环境中学习的新算法的研究人员所面临的最大障碍之一是环境之间的不相容性。 这些新算法揭示了在机器学习的关键领域成功的先决条件,如超分布通用、饱和学习、算法公正和多模式学习。