Transformers are state-of-the-art deep learning models that are composed of stacked attention and point-wise, fully connected layers designed for handling sequential data. Transformers are not only ubiquitous throughout Natural Language Processing (NLP), but, recently, they have inspired a new wave of Computer Vision (CV) applications research. In this work, a Vision Transformer (ViT) is applied to predict the state variables of 2-dimensional Ising model simulations. Our experiments show that ViT outperform state-of-the-art Convolutional Neural Networks (CNN) when using a small number of microstate images from the Ising model corresponding to various boundary conditions and temperatures. This work opens the possibility of applying ViT to other simulations, and raises interesting research directions on how attention maps can learn about the underlying physics governing different phenomena.
翻译:变异器是最新的深层学习模型,由堆积的注意力和点向的、完全相连的层层组成,用于处理连续数据。变异器不仅在整个自然语言处理过程中无处不在,而且最近还激发了计算机视野应用研究的新浪潮。在这项工作中,应用了视野变异器来预测二维的模拟模型的状态变量。我们的实验显示,Vitt在使用与不同边界条件和温度相对应的Ising模型的少量微型图像时,优于最先进的革命神经网络(CNN ) 。 这项工作开辟了将VIT应用到其他模拟中的可能性,并提出了有趣的研究方向,说明注意地图如何了解管理不同现象的物理基础。