Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wasteful in their use of parameters. Indeed, we highlight in this paper that the learned subnetwork fail to share even generic features which limits their applicability on smaller mobile and AR/VR devices. We posit this behavior stems from an ill-posed part of the multi-input multi-output framework. To solve this issue, we propose a novel unmixing step in MIMO architectures that allows subnetworks to properly share features. Preliminary experiments on CIFAR-100 show our adjustments allow feature sharing and improve model performance for small architectures.
翻译:多投入多产出结构提议在一个基础网络内培训多个子网络,然后平均进行子网络预测,以便从免费组合中受益。 尽管有些相对成功, 这些结构在使用参数方面是浪费的。 事实上,我们在本文中强调, 所学的子网络甚至没有共享通用特征, 从而限制了其对小型移动和AR/VR装置的适用性。 我们认为, 这种行为源于多投入多产出多产出框架的错误部分。 为了解决这个问题, 我们提议在MIMO结构中采取新的非混合步骤, 使亚网络能够适当分享特征。 关于CIFAR-100的初步实验显示,我们的调整允许小型结构的特征共享和改善模型性能。