In order for language models to aid physics research, they must first encode representations of mathematical and natural language discourse which lead to coherent explanations, with correct ordering and relevance of statements. We present a collection of datasets developed to evaluate the performance of language models in this regard, which measure capabilities with respect to sentence ordering, position, section prediction, and discourse coherence. Analysis of the data reveals equations and sub-disciplines which are most common in physics discourse, as well as the sentence-level frequency of equations and expressions. We present baselines that demonstrate how contemporary language models are challenged by coherence related tasks in physics, even when trained on mathematical natural language objectives.
翻译:为了使语言模型能够帮助物理学研究,它们必须首先将数学和自然语言话语的表达方式编码,从而得出一致的解释,并有准确的顺序和相关性。我们提供了为评价语言模型在这方面的性能而开发的数据集,这些数据集衡量了在判决顺序、位置、部门预测和话语一致性方面的能力。数据分析揭示了在物理学话语中最常见的方程式和次纪律,以及等式和表达方式的句级频率。我们提出了基线,表明现代语言模型如何受到物理学中一致性相关任务的挑战,即使是在接受数学自然语言目标培训时也是如此。