We introduce fully stochastic layers in vision transformers, without causing any severe drop in performance. The additional stochasticity boosts the robustness of visual features and strengthens privacy. In this process, linear layers with fully stochastic parameters are used, both during training and inference, to transform the feature activations of each multilayer perceptron. Such stochastic linear operations preserve the topological structure, formed by the set of tokens passing through the shared multilayer perceptron. This operation encourages the learning of the recognition task to rely on the topological structures of the tokens, instead of their values, which in turn offers the desired robustness and privacy of the visual features. In this paper, we use our features for three different applications, namely, adversarial robustness, network calibration, and feature privacy. Our features offer exciting results on those tasks. Furthermore, we showcase an experimental setup for federated and transfer learning, where the vision transformers with stochastic layers are again shown to be well behaved. Our source code will be made publicly available.
翻译:在视觉变压器中,我们引入了完全的随机层,但不会造成性能的任何严重下降。 额外的随机度提高了视觉特征的稳健性, 加强了隐私。 在此过程中, 在培训和推断过程中, 使用具有完全随机性参数的线性层来转换每个多层感官的特性启动。 这种随机性线性操作保存着由通过共享多层感应器传递的一组象征组成的地形结构。 此操作鼓励学习识别任务, 以依赖象征物的表层结构, 而不是其价值, 这反过来提供了所需的稳健性和隐私性。 在本文中, 我们使用我们三个不同应用程序的特征, 即对抗性强健性、 网络校准和特征隐私。 我们的特征为这些任务提供了令人兴奋的结果 。 此外, 我们展示了一种供进化和传输学习的实验性设置, 在那里, 带有随机层的视觉变压器再次表现出良好的行为方式 。 我们的源代码将被公诸于众 。