Recent breakthroughs based on big/foundation models reveal a vague avenue for AI, that is, \emph{big data, big/foundation models, big learning, $\cdots$}. Following that avenue, here we elaborate on our newly introduced big learning. Specifically, big learning exhaustively exploits the information/tasks inherent in its large-scale \emph{complete/incomplete} training data, by learning to simultaneously model many/all joint/conditional/marginal data distributions (thus named big learning) with one universal foundation model. We reveal that big learning is what existing foundation models are implicitly doing; accordingly, our big learning provides high-level guidance for flexible design and improvements of foundation models. Besides, big learning ($i$) is equipped with great flexibilities for complete/incomplete training data and for customizing trustworthy data tasks; ($ii$) potentially delivers all joint/conditional/marginal data capabilities after training; ($iii$) significantly reduces the training-test gap with improved model generalization; and ($iv$) potentially unifies conventional machine learning paradigms and enables their flexible cooperations, manifested as a universal learning paradigm. Preliminary experiments verified the effectiveness of the presented big learning.
翻译:基于大/基础模型的最近突破揭示了AI的一个模糊的渠道,即 \ emph{ 重数据、 大/ 重数据、 大/ 重建模型、 大学习, $\ cdots$。 在这条途径之后, 我们在这里详细介绍我们新引入的大学习。 具体地说, 大学习充分利用其大规模培训数据中固有的信息/任务( emph{ 完整/不完整/ 不完整) 培训数据; 通过学习同时用一个通用的基础模型来模拟许多/ 全部联合/ 有条件/ 边际数据分布( 称为大学习 ) 。 我们发现, 大学习是现有基础模型隐含的; 因此, 大学习为灵活设计和改进基础模型提供了高级指导。 此外, 大学习( $) 具有巨大的灵活性, 用于完整/ 不完全的培训数据, 以及定制可信赖的数据任务 ; (二) 在培训后, 可能提供所有联合/ 有条件/ 边际数据 能力( 3美元) 大大缩小培训- 差距, 改进模型的通用化; 和 (iv 美元) 可能使常规机器学习模式范式变得不正规, 并能够进行灵活的实验, 展示大学习。