空间数据上的因果掩码：基于信息论视角论证使用单模态语言模型学习空间数据集的可行性 (Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models)

Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizations are instead used. Yet the question of whether it is viable to accept the information loss introduced by causal masking on nonsequential data has received little direct study, in part because few domains offer both spatial and sequential representations of the same dataset. In this work, we investigate this issue in the domain of chess, which naturally supports both representations. We train language models with bidirectional and causal self-attention mechanisms on both spatial (board-based) and sequential (move-based) data. Our results show that models trained on spatial board states - \textit{even with causal masking} - consistently achieve stronger playing strength than models trained on sequential data. While our experiments are conducted on chess, our results are methodological and may have broader implications: applying causal masking to spatial data is a viable procedure for training unimodal LLMs on spatial data, and in some domains is even preferable to sequentialization.

翻译：语言模型传统上围绕因果掩码设计。在具有空间或关系结构的领域中，因果掩码常被视为不适用，转而采用序列线性化方法。然而，对于在非序列数据上接受因果掩码引入的信息损失是否可行这一问题，目前鲜有直接研究，部分原因在于同时提供同一数据集的空间与序列表示的领域较少。本研究以国际象棋领域为例探讨此问题，该领域天然支持两种表示形式。我们训练了具有双向和因果自注意力机制的语言模型，分别处理空间（基于棋盘状态）和序列（基于走棋顺序）数据。结果表明，在空间棋盘状态上训练的模型——即使采用因果掩码——始终比基于序列数据训练的模型表现出更强的对弈能力。虽然实验在国际象棋领域进行，但我们的结论具有方法论意义，可能产生更广泛的影响：对空间数据应用因果掩码是训练单模态大语言模型处理空间数据的可行方案，在某些领域甚至优于序列化方法。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日