将业绩评价分组的快速和综合算法,用作者名进行拼写 (A fast and integrative algorithm for clustering performance evaluation in author name disambiguation) - 专知论文

会员服务 ·

0

Integration · 簇 · Performer · 可约的 · 示例 ·

2021 年 2 月 5 日

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

翻译：将业绩评价分组的快速和综合算法,用作者名进行拼写

from arxiv, 20 pages

Author name disambiguation results are often evaluated by measures such as Cluster-F, K-metric, Pairwise-F, Splitting & Lumping Error, and B-cubed. Although these measures have distinctive evaluation schemes, this paper shows that they can be calculated in a single framework by a set of common steps that compare truth and predicted clusters through two hash tables recording information about name instances with their predicted cluster indices and frequencies of those indices per truth cluster. This integrative calculation reduces greatly calculation runtime, which is scalable to a clustering task involving millions of name instances within a few seconds. During the integration process, B-cubed and K-metric are shown to produce the same precision and recall scores. In this framework, especially, name instance pairs for Pairwise-F are counted using a heuristic, surpassing a state-of-the-art algorithm in speedy calculation. Details of the integrative calculation are described with examples and pseudo-code to assist scholars to implement each measure easily and validate the correctness of implementation. The integrative calculation will help scholars compare similarities and differences of multiple measures before they select ones that characterize best the clustering performances of their disambiguation methods.

翻译：作者姓名模糊性结果通常通过Croup-F、K-计量、Pairwise-F、分解和翻转错误以及B-cubed等措施进行评估。虽然这些措施有不同的评估方案,但本文件表明,它们可以在一个单一的框架内通过一套共同的步骤来计算,这些步骤通过两个散列表来比较真相和预测的群集,这些群集通过两个散列表来记录姓名实例的信息及其预测群集指数和每组真相指数的频率。这种综合计算极大地减少了计算运行时间,这可以适用于在几秒钟内涉及数百万名实例的群集任务。在集成过程中,B-cud和K-度显示,得出相同的精确度和回顾分数。在这个框架中,特别是,对Pairwise-F的国名实例配对的计算使用超前期算法,在快速计算中超过了最新算法。综合计算的细节用示例和假码来描述,以协助学者执行每一项措施,并验证执行的正确性。综合计算将有助于学者比较多种措施的相似性和差异。在他们选择最能体现其不成熟方法的组合表现的方法之前,例如和假代码。

0

相关内容

Integration

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI杂志。 Publisher：Elsevier。 SIT：http://dblp.uni-trier.de/db/journals/integration/

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

专知会员服务

45+阅读 · 2020年8月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

机器学习速查手册，135页pdf

机器学习速查手册，135页pdf

专知会员服务

345+阅读 · 2020年3月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【ACM MM论文集】国际多媒体顶级会议ACM Multimedia 2017 Open Access Repository

【ACM MM论文集】国际多媒体顶级会议ACM Multimedia 2017 Open Access Repository

专知

13+阅读 · 2017年10月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Contracting and Involutive Negations of Probability Distributions

Arxiv

0+阅读 · 2021年3月30日

edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

Arxiv

0+阅读 · 2021年3月29日

Reduced-Rank Tensor-on-Tensor Regression and Tensor-variate Analysis of Variance

Arxiv

0+阅读 · 2021年3月29日

Automatic Clustering in Hyrise

Arxiv

0+阅读 · 2021年3月29日

SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

Arxiv

0+阅读 · 2021年3月27日

Collecting large-scale publication data at the level of individual researchers: A practical proposal for author name disambiguation

Arxiv

0+阅读 · 2021年3月26日

Investigating spatial scan statistics for multivariate functional data

Arxiv

0+阅读 · 2021年3月26日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Object-centric Auto-encoders and Dummy Anomalies for Abnormal Event Detection in Video

Object-centric Auto-encoders and Dummy Anomalies for Abnormal Event Detection in Video

Arxiv

5+阅读 · 2018年12月11日

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

Arxiv

4+阅读 · 2018年6月25日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

【2020Manning新书】微型化Python项目，325页pdf，Tiny Python Projects

专知会员服务

45+阅读 · 2020年8月18日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

机器学习速查手册，135页pdf

机器学习速查手册，135页pdf

专知会员服务

345+阅读 · 2020年3月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《运用基于智能体的建模与仿真转变部署准备状态》报告

《美国防部大语言模型应用中的网络安全挑战与缓解措施》报告

《自适应鲁棒马尔可夫决策过程：协同作战飞机（CCA）对抗性监视任务应用》44页技术报告

《行动中的AI智能体：评估与治理的基础》34页报告

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【ACM MM论文集】国际多媒体顶级会议ACM Multimedia 2017 Open Access Repository

【ACM MM论文集】国际多媒体顶级会议ACM Multimedia 2017 Open Access Repository

专知

13+阅读 · 2017年10月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Contracting and Involutive Negations of Probability Distributions

Arxiv

0+阅读 · 2021年3月30日

edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

Arxiv

0+阅读 · 2021年3月29日

Reduced-Rank Tensor-on-Tensor Regression and Tensor-variate Analysis of Variance

Arxiv

0+阅读 · 2021年3月29日

Automatic Clustering in Hyrise

Arxiv

0+阅读 · 2021年3月29日

SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

Arxiv

0+阅读 · 2021年3月27日

Collecting large-scale publication data at the level of individual researchers: A practical proposal for author name disambiguation

Arxiv

0+阅读 · 2021年3月26日

Investigating spatial scan statistics for multivariate functional data

Arxiv

0+阅读 · 2021年3月26日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Object-centric Auto-encoders and Dummy Anomalies for Abnormal Event Detection in Video

Object-centric Auto-encoders and Dummy Anomalies for Abnormal Event Detection in Video

Arxiv

5+阅读 · 2018年12月11日

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs

Arxiv

4+阅读 · 2018年6月25日

微信扫码咨询专知VIP会员