Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparsely interacting parts, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.
翻译:在人类认知的启发下,机器学习系统逐渐揭示了稀疏和更多模块化结构的优势。最近的工作表明,不仅某些模块化结构非常笼统,而且它们还导致更好的分配外的概括化、属性的扩大、学习速度和可解释性。这些系统成功背后的一个关键直觉是,大多数现实世界环境中的数据生成系统被认为包含鲜为互动的部件,而具有类似感知偏差的模式将是有益的。然而,由于这些现实世界数据分布复杂和未知,实地缺乏对这些系统的严格量化评估。在这项工作中,我们通过简单和已知模块化数据分布的透镜,对共同模块化结构进行了透彻的评估。我们强调模块化和松散的好处,并揭示了在优化模块化系统的同时所面临的挑战。我们为此提出评价指标,强调模块化的好处,这些好处所在的制度是巨大的,以及当前终端到终端模块化系统相对于其声称的潜力而言的次级最佳性。