Accurate and physically consistent modeling of Earth system dynamics requires machine-learning architectures that operate directly on continuous geophysical fields and preserve their underlying geometric structure. Here we introduce Field-Space attention, a mechanism for Earth system Transformers that computes attention in the physical domain rather than in a learned latent space. By maintaining all intermediate representations as continuous fields on the sphere, the architecture enables interpretable internal states and facilitates the enforcement of scientific constraints. The model employs a fixed, non-learned multiscale decomposition and learns structure-preserving deformations of the input field, allowing coherent integration of coarse and fine-scale information while avoiding the optimization instabilities characteristic of standard single-scale Vision Transformers. Applied to global temperature super-resolution on a HEALPix grid, Field-Space Transformers converge more rapidly and stably than conventional Vision Transformers and U-Net baselines, while requiring substantially fewer parameters. The explicit preservation of field structure throughout the network allows physical and statistical priors to be embedded directly into the architecture, yielding improved fidelity and reliability in data-driven Earth system modeling. These results position Field-Space Attention as a compact, interpretable, and physically grounded building block for next-generation Earth system prediction and generative modeling frameworks.
翻译:地球系统动力学的精确且物理一致的建模需要能够直接在连续地球物理场上操作并保持其底层几何结构的机器学习架构。本文提出场空间注意力机制,这是一种用于地球系统Transformer的机制,它在物理域而非学习的潜在空间中计算注意力。通过将所有中间表示保持为球面上的连续场,该架构实现了可解释的内部状态,并便于施加科学约束。该模型采用固定的非学习多尺度分解,并学习输入场的结构保持形变,从而允许粗尺度与细尺度信息的连贯整合,同时避免了标准单尺度视觉Transformer特有的优化不稳定性。在HEALPix网格上应用于全球温度超分辨率任务时,场空间Transformer比传统的视觉Transformer和U-Net基线模型收敛更快更稳定,且所需参数量显著减少。在整个网络中显式保持场结构使得物理和统计先验能够直接嵌入架构中,从而在数据驱动的地球系统建模中实现了更高的保真度和可靠性。这些结果表明,场空间注意力机制可作为下一代地球系统预测和生成建模框架中一个紧凑、可解释且物理基础坚实的构建模块。