推动文本可读性评估:变革者与手工艺语言特征相匹配 (Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features) - 专知论文

会员服务 ·

0

Readability · MoDELS · 变换 · 模型性能 · 模型评估 ·

2021 年 9 月 25 日

Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features

翻译：推动文本可读性评估:变革者与手工艺语言特征相匹配

Bruce W. Lee,Yoo Sung Jang,Jason Hyung-Jong Lee

from arxiv, 18 pages, 3 figures, Empirical Methods in Natural Language Processing 2021 (Main Conference)

We report two essential improvements in readability assessment: 1. three novel features in advanced semantics and 2. the timely evidence that traditional ML models (e.g. Random Forest, using handcrafted features) can combine with transformers (e.g. RoBERTa) to augment model performance. First, we explore suitable transformers and traditional ML models. Then, we extract 255 handcrafted linguistic features using self-developed extraction software. Finally, we assemble those to create several hybrid models, achieving state-of-the-art (SOTA) accuracy on popular datasets in readability assessment. The use of handcrafted features help model performance on smaller datasets. Notably, our RoBERTA-RF-T1 hybrid achieves the near-perfect classification accuracy of 99%, a 20.3% increase from the previous SOTA.

翻译：我们报告了可读性评估方面的两个基本改进:1. 高级语义学的三个新特征;2. 及时证明传统ML模型(如随机森林,使用手工制作的特征)可以与变压器(如Robreta)结合,以提高模型性能。首先,我们探索适当的变压器和传统的ML模型。然后,我们利用自开发的提取软件提取255个手工制作的语言特征。最后,我们将这些特征组装起来,以创建若干混合模型,在可读性评估中实现大众数据集的最新精确度。使用手工艺性特征有助于小型数据集的模型性能。值得注意的是,我们的ROBERTA-RF-T1混合体实现了近乎完美99%的分类精确度,比以前的SOTA增加了20.3%。

0

相关内容

Readability

一个旨在提升互联网阅读体验的工具。 http://readability.com/

《汽车驾驶自动化分级》国家标准发布

专知会员服务

31+阅读 · 2021年10月4日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

自然语言处理顶会EMNLP2020接受论文出炉！754篇录用！哈工大SCIR九篇长文被接受

自然语言处理顶会EMNLP2020接受论文出炉！754篇录用！哈工大SCIR九篇长文被接受

专知会员服务

34+阅读 · 2020年9月17日

【经典书】图像处理手册第七版，1032页pdf

专知会员服务

163+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

专知会员服务

65+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

一文读懂最强中文NLP预训练模型ERNIE

一文读懂最强中文NLP预训练模型ERNIE

AINLP

25+阅读 · 2019年10月22日

【清华大学NLP】预训练语言模型（PLM）必读论文清单，附论文PDF、源码和模型链接

【清华大学NLP】预训练语言模型（PLM）必读论文清单，附论文PDF、源码和模型链接

专知

39+阅读 · 2019年9月27日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Github项目推荐 | pikepdf - Python的PDF读写库

Github项目推荐 | pikepdf - Python的PDF读写库

AI研习社

9+阅读 · 2019年3月29日

NLP预训练模型大集合

NLP预训练模型大集合

机器学习算法与Python学习

8+阅读 · 2018年12月28日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】自动特征工程开源框架

【推荐】自动特征工程开源框架

机器学习研究会

17+阅读 · 2017年11月7日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2: Scaling Up Capacity and Resolution

Arxiv

7+阅读 · 2021年11月18日

TransMix: Attend to Mix for Vision Transformers

TransMix: Attend to Mix for Vision Transformers

Arxiv

1+阅读 · 2021年11月18日

Nonparametric Scanning For Nonrandom Missing Data With Continuous Instrumental Variables

Arxiv

0+阅读 · 2021年11月17日

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Arxiv

0+阅读 · 2021年11月16日

Multi-Modal Answer Validation for Knowledge-Based VQA

Arxiv

6+阅读 · 2021年3月23日

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Arxiv

4+阅读 · 2019年9月11日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月17日

VIP会员

文章信息

相关主题

相关VIP内容

《汽车驾驶自动化分级》国家标准发布

专知会员服务

31+阅读 · 2021年10月4日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

自然语言处理顶会EMNLP2020接受论文出炉！754篇录用！哈工大SCIR九篇长文被接受

自然语言处理顶会EMNLP2020接受论文出炉！754篇录用！哈工大SCIR九篇长文被接受

专知会员服务

34+阅读 · 2020年9月17日

【经典书】图像处理手册第七版，1032页pdf

专知会员服务

163+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

《Hands-On Machine Learning with Scikit-Learn and TensorFlow》Scikit-Learn与TensorFlow机器学习实用指南

专知会员服务

65+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

一文读懂最强中文NLP预训练模型ERNIE

一文读懂最强中文NLP预训练模型ERNIE

AINLP

25+阅读 · 2019年10月22日

【清华大学NLP】预训练语言模型（PLM）必读论文清单，附论文PDF、源码和模型链接

【清华大学NLP】预训练语言模型（PLM）必读论文清单，附论文PDF、源码和模型链接

专知

39+阅读 · 2019年9月27日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Github项目推荐 | pikepdf - Python的PDF读写库

Github项目推荐 | pikepdf - Python的PDF读写库

AI研习社

9+阅读 · 2019年3月29日

NLP预训练模型大集合

NLP预训练模型大集合

机器学习算法与Python学习

8+阅读 · 2018年12月28日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】自动特征工程开源框架

【推荐】自动特征工程开源框架

机器学习研究会

17+阅读 · 2017年11月7日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2: Scaling Up Capacity and Resolution

Arxiv

7+阅读 · 2021年11月18日

TransMix: Attend to Mix for Vision Transformers

TransMix: Attend to Mix for Vision Transformers

Arxiv

1+阅读 · 2021年11月18日

Nonparametric Scanning For Nonrandom Missing Data With Continuous Instrumental Variables

Arxiv

0+阅读 · 2021年11月17日

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Arxiv

0+阅读 · 2021年11月16日

Multi-Modal Answer Validation for Knowledge-Based VQA

Arxiv

6+阅读 · 2021年3月23日

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Arxiv

4+阅读 · 2019年9月11日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Universal Language Model Fine-tuning for Text Classification

Arxiv

3+阅读 · 2018年5月17日

微信扫码咨询专知VIP会员