Scholarly text is often laden with jargon, or specialized language that divides disciplines. We extend past work that characterizes science at the level of word types, by using BERT-based word sense induction to find additional words that are widespread but overloaded with different uses across fields. We define scholarly jargon as discipline-specific word types and senses, and estimate its prevalence across hundreds of fields using interpretable, information-theoretic metrics. We demonstrate the utility of our approach for science of science and computational sociolinguistics by highlighting two key social implications. First, we measure audience design, and find that most fields reduce jargon when publishing in general-purpose journals, but some do so more than others. Second, though jargon has varying correlation with articles' citation rates within fields, it nearly always impedes interdisciplinary impact. Broadly, our measurements can inform ways in which language could be revised to serve as a bridge rather than a barrier in science.
翻译:学术性文字往往用词典或专业语言来区分学科。 我们扩展过去科学在字型层面的特点, 使用基于 BERT 的字感感感感应, 寻找其他广泛但在不同领域使用过量的词汇。 我们将学术性词典定义为特定学科的字型和感应, 并使用可解释的信息理论度量来估计其在数百个领域的流行程度。 我们通过突出两个关键社会影响, 展示了我们科学学和计算社会语言学方法的实用性。 首先, 我们测量受众设计, 发现大多数领域在出版一般用途期刊时会减少术语, 但有些领域会比其他领域要多。 其次, 虽然术语与本领域文章的引用率有不同的相关性, 但几乎总是会阻碍跨学科影响。 广义上, 我们的测量可以告知如何修改语言作为桥梁而不是科学障碍。