快速反应COVID-19文字分类的定期表达式 (Regular Expressions for Fast-response COVID-19 Text Classification) - 专知论文

会员服务 ·

0

正则化项 · 查全率/召回率 · COVID-19 · 查准率/准确率 · 正则表达式 ·

2021 年 6 月 21 日

Regular Expressions for Fast-response COVID-19 Text Classification

翻译：快速反应COVID-19文字分类的定期表达式

Igor L. Markov,Jacqueline Liu,Adam Vagner

from arxiv, 10 pages, 7 tables

Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper introduces infrastructure and methodologies for text classifiers based on large-scale regular expressions. In particular, we describe how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully define a topic and evaluate classifier performance we employ human-guided iterations of keyword discovery, but do not require labeled data. For COVID-19, we build two sets of regular expressions: (1) for 66 languages, with 99% precision and recall >50%, (2) for the 11 most common languages, with precision >90% and recall >90%. Regular expressions enable low-latency queries from multiple platforms. Response to challenges like COVID-19 is fast and so are revisions. Comparisons to a DNN classifier show explainable results, higher precision and recall, and less overfitting. Our learnings can be applied to other narrow-topic classifiers.

翻译：文本分类器是许多 NLP 应用程序的核心, 并使用多种算法方法和软件。本文介绍了基于大规模常规表达式的文本分类器的基础设施和方法。特别是, 我们描述Facebook如何确定某个文本( 从标签到文章的任何内容)是否属于CCOVID-19等狭隘的话题。要充分定义一个专题并评估分类器性能, 我们使用人类引导的关键词发现迭代, 但不需要标签数据。对于 COVID-19, 我们建立两套常规表达器:(1) 66种语言, 精确度为99%, 并记得 > 50%, (2) 最常用的11种语言, 精确度为 > 90%, 记得 > 90% 。常规表达器允许多个平台进行低延迟查询。对 COVID-19 等挑战的反应是快速的, 如此修改。与 DNN 分类器的比较显示可解释的结果、更高精度和回忆, 并且不那么完美。我们的学习方法可以适用于其他狭隘的分类器。

0

相关内容

正则化项

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

专知会员服务

24+阅读 · 2020年4月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

AINLP

5+阅读 · 2019年8月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Contextualizing Variation in Text Style Transfer Datasets

Arxiv

0+阅读 · 2021年8月17日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Text Classification Algorithms: A Survey

Arxiv

16+阅读 · 2020年5月20日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月23日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction

Arxiv

9+阅读 · 2018年2月28日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

Arbitrarily-Oriented Text Recognition

Arxiv

3+阅读 · 2017年11月12日

VIP会员

文章信息

相关主题

查全率/召回率

查准率/准确率

正则表达式

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

【ACL2020-Allen AI】预训练语言模型中的无监督域聚类

专知会员服务

24+阅读 · 2020年4月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

【论文】Awesome Relation Classification Paper（关系分类）（PART I）

AINLP

5+阅读 · 2019年8月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Contextualizing Variation in Text Style Transfer Datasets

Arxiv

0+阅读 · 2021年8月17日

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Arxiv

12+阅读 · 2021年2月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Text Classification Algorithms: A Survey

Arxiv

16+阅读 · 2020年5月20日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月23日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction

Arxiv

9+阅读 · 2018年2月28日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

Arbitrarily-Oriented Text Recognition

Arxiv

3+阅读 · 2017年11月12日

微信扫码咨询专知VIP会员