NLPSTT Test: 比较NLP系统业绩的工具包 (NLPStatTest: A Toolkit for Comparing NLP System Performance) - 专知论文

会员服务 ·

0

Performer · 估计/估计量 · 统计量 · NLP · Automator ·

2020 年 11 月 26 日

NLPStatTest: A Toolkit for Comparing NLP System Performance

翻译：NLPSTT Test: 比较NLP系统业绩的工具包

Haotian Zhu,Denise Mak,Jesse Gioannini,Fei Xia

from arxiv, Will appear in AACL-IJCNLP 2020

Statistical significance testing centered on p-values is commonly used to compare NLP system performance, but p-values alone are insufficient because statistical significance differs from practical significance. The latter can be measured by estimating effect size. In this paper, we propose a three-stage procedure for comparing NLP system performance and provide a toolkit, NLPStatTest, that automates the process. Users can upload NLP system evaluation scores and the toolkit will analyze these scores, run appropriate significance tests, estimate effect size, and conduct power analysis to estimate Type II error. The toolkit provides a convenient and systematic way to compare NLP system performance that goes beyond statistical significance testing

翻译：以p-value为核心的统计意义测试通常用于比较NLP系统性能,但单是p-value本身是不够的,因为统计意义与实际意义不同,后者可以通过估计影响大小来衡量。在本文中,我们提出一个三阶段程序,用于比较NLP系统性能,并提供工具箱NLPSTATTTTest,使过程自动化。用户可以上传NLP系统性能评分,工具包将分析这些分数,进行适当的意义测试,估计影响大小,并进行能力分析,以估计第二类错误。工具包提供了方便和系统的方法,比较NLP系统性能,超越了统计意义测试。

0

相关内容

Performer

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

已删除

将门创投

5+阅读 · 2017年10月20日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Design and Analysis of Switchback Experiments

Design and Analysis of Switchback Experiments

Arxiv

0+阅读 · 2021年1月14日

Classification with Strategically Withheld Data

Classification with Strategically Withheld Data

Arxiv

0+阅读 · 2021年1月14日

Similarity-based prediction for channel mapping and user positioning

Arxiv

0+阅读 · 2021年1月14日

OrigamiSet1.0: Two New Datasets for Origami Classification and Difficulty Estimation

Arxiv

0+阅读 · 2021年1月14日

Anomaly Detection Support Using Process Classification

Arxiv

0+阅读 · 2021年1月13日

A Baseline for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年3月1日

Text Classification Algorithms: A Survey

Arxiv

6+阅读 · 2019年4月25日

Notes on Deep Learning for NLP

Arxiv

22+阅读 · 2018年8月30日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Arxiv

9+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】面向时间序列基础模型的合成序列符号数据生成方法

军事通信市场七大趋势概述

【CMU博士论文】深度学习中泛化的量化、理解与改进

面向低光照图像增强的扩散模型

相关资讯

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

已删除

将门创投

5+阅读 · 2017年10月20日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Design and Analysis of Switchback Experiments

Design and Analysis of Switchback Experiments

Arxiv

0+阅读 · 2021年1月14日

Classification with Strategically Withheld Data

Classification with Strategically Withheld Data

Arxiv

0+阅读 · 2021年1月14日

Similarity-based prediction for channel mapping and user positioning

Arxiv

0+阅读 · 2021年1月14日

OrigamiSet1.0: Two New Datasets for Origami Classification and Difficulty Estimation

Arxiv

0+阅读 · 2021年1月14日

Anomaly Detection Support Using Process Classification

Arxiv

0+阅读 · 2021年1月13日

A Baseline for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年3月1日

Text Classification Algorithms: A Survey

Arxiv

6+阅读 · 2019年4月25日

Notes on Deep Learning for NLP

Arxiv

22+阅读 · 2018年8月30日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Arxiv

9+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员