写作特征告诉我们人工智能文件是什么? (What do writing features tell us about AI papers?)

As the numbers of submissions to conferences grow quickly, the task of assessing the quality of academic papers automatically, convincingly, and with high accuracy attracts increasing attention. We argue that studying interpretable dimensions of these submissions could lead to scalable solutions. We extract a collection of writing features, and construct a suite of prediction tasks to assess the usefulness of these features in predicting citation counts and the publication of AI-related papers. Depending on the venues, the writing features can predict the conference vs. workshop appearance with F1 scores up to 60-90, sometimes even outperforming the content-based tf-idf features and RoBERTa. We show that the features describe writing style more than content. To further understand the results, we estimate the causal impact of the most indicative features. Our analysis on writing features provides a perspective to assessing and refining the writing of academic articles at scale.

翻译：随着向会议提交的文件数量迅速增加,自动、令人信服和高度精确地评估学术论文质量的任务日益引起人们的注意。我们争辩说,研究这些论文可解释的方面可能会导致可伸缩的解决办法。我们抽取了一些写作特征,并设计了一套预测任务,以评估这些特点在预测引注数和出版与AI有关的论文方面的效用。根据会议地点,书面特征可以预测会议与F1讲习班的出现相比达到60-90分,有时甚至超过基于内容的tf-idf特征和RoBERTA。我们显示,这些特征描述的写作风格比内容要多。为了进一步理解结果,我们估计了最指示性特征的因果关系。我们对写作特征的分析为评估和改进规模学术文章的写作提供了一个视角。

相关内容

TF-IDF

关注 0

TF-IDF（英语：term frequency–inverse document frequency）是一种用于信息检索与文本挖掘的常用加权技术。tf-idf是一种统计方法，用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加，但同时会随着它在语料库中出现的频率成反比下降。tf-idf加权的各种形式常被搜索引擎应用，作为文件与用户查询之间相关程度的度量或评级。除了tf-idf以外，互联网上的搜索引擎还会使用基于链接分析的评级方法，以确定文件在搜索结果中出现的顺序。

自然语言处理顶会COLING2020最佳论文出炉！

专知会员服务

24+阅读 · 2020年12月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日