泰语拼写语义错误 (Misspelling Semantics In Thai) - 专知论文

会员服务 ·

0

Analysis · 可理解性 · 讲稿 · 掩码自编码MAE · Boosting（一种模型训练加速方式） ·

2022 年 6 月 20 日

Misspelling Semantics In Thai

翻译：泰语拼写语义错误

Pakawat Nakwijit,Matthew Purver

from arxiv, To be published in LREC2022

User-generated content is full of misspellings. Rather than being just random noise, we hypothesise that many misspellings contain hidden semantics that can be leveraged for language understanding tasks. This paper presents a fine-grained annotated corpus of misspelling in Thai, together with an analysis of misspelling intention and its possible semantics to get a better understanding of the misspelling patterns observed in the corpus. In addition, we introduce two approaches to incorporate the semantics of misspelling: Misspelling Average Embedding (MAE) and Misspelling Semantic Tokens (MST). Experiments on a sentiment analysis task confirm our overall hypothesis: additional semantics from misspelling can boost the micro F1 score up to 0.4-2%, while blindly normalising misspelling is harmful and suboptimal.

翻译：用户生成的内容充满了拼写错误。我们假设许多拼写错误包含隐藏的语义, 可用于语言理解任务。本文展示了泰国语拼写错误的精细图, 并分析了拼写错误的意图及其可能的语义, 以便更好地了解在文体中观察到的拼写错误模式。此外, 我们引入了两种方法, 以纳入拼写错误的语义: 拼写错误平均嵌入( MAE) 和拼写语语语调( MST) 。对情绪分析任务进行的实验证实了我们的总体假设: 从拼写错误中增加的语义可以将微F1的评分提高到0. 4-2 %, 而盲目地使拼写错误正常化是有害和次优的。

0

相关内容

Analysis

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

CNTs/ La^3+掺杂TiO_2纳米纤维的制备及对重金属去除机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

BAFF干扰的树突状细胞参与自身免疫性关节炎免疫耐受的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NaGdF4纳米晶体的磁共振分子影像探针的构建与性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

S1P联合PR-MSCs移植在治疗小鼠急性心肌梗死中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

PP2A介导的组蛋白去磷酸化对DNA损伤修复的调控

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

RGC-32参与TGF-β#35825;导肾小管上皮向间充质细胞转化的分子调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

Boosting Video-Text Retrieval with Explicit High-Level Semantics

Arxiv

0+阅读 · 2022年8月8日

A Map of Diverse Synthetic Stable Roommates Instances

Arxiv

0+阅读 · 2022年8月8日

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization

Arxiv

0+阅读 · 2022年8月8日

CSSAM:Code Search via Attention Matching of Code Semantics and Structures

Arxiv

0+阅读 · 2022年8月8日

SelfCoLearn: Self-supervised collaborative learning for accelerating dynamic MR imaging

Arxiv

0+阅读 · 2022年8月8日

Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration

Arxiv

0+阅读 · 2022年8月8日

Communication Beyond Transmitting Bits: Semantics-Guided Source and Channel Coding

Arxiv

0+阅读 · 2022年8月4日

A Class of Dimension-free Metrics for the Convergence of Empirical Measures

Arxiv

0+阅读 · 2022年8月4日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

VIP会员

文章信息

相关主题

掩码自编码MAE

Boosting（一种模型训练加速方式）

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰的战术侦察打击：对美国陆军启示》报告

《多域作战环境下通过军民合作推进U空间发展》报告

《无人机蜂群在模拟战斗环境中对任务效能的影响》50页

《第一人称视角武装无人机的作战飞行艺术与科学》报告

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Boosting Video-Text Retrieval with Explicit High-Level Semantics

Arxiv

0+阅读 · 2022年8月8日

A Map of Diverse Synthetic Stable Roommates Instances

Arxiv

0+阅读 · 2022年8月8日

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization

Arxiv

0+阅读 · 2022年8月8日

CSSAM:Code Search via Attention Matching of Code Semantics and Structures

Arxiv

0+阅读 · 2022年8月8日

SelfCoLearn: Self-supervised collaborative learning for accelerating dynamic MR imaging

Arxiv

0+阅读 · 2022年8月8日

Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration

Arxiv

0+阅读 · 2022年8月8日

Communication Beyond Transmitting Bits: Semantics-Guided Source and Channel Coding

Arxiv

0+阅读 · 2022年8月4日

A Class of Dimension-free Metrics for the Convergence of Empirical Measures

Arxiv

0+阅读 · 2022年8月4日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

相关基金

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

CNTs/ La^3+掺杂TiO_2纳米纤维的制备及对重金属去除机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

BAFF干扰的树突状细胞参与自身免疫性关节炎免疫耐受的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NaGdF4纳米晶体的磁共振分子影像探针的构建与性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

S1P联合PR-MSCs移植在治疗小鼠急性心肌梗死中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

PP2A介导的组蛋白去磷酸化对DNA损伤修复的调控

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

RGC-32参与TGF-β#35825;导肾小管上皮向间充质细胞转化的分子调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员