文本分类的正正对匹配跟踪 (Orthogonal Matching Pursuit for Text Classification) - 专知论文

会员服务 ·

0

文本分类 · 正则化项 · 正交 · 特化 · GROUP ·

2018 年 7 月 12 日

Orthogonal Matching Pursuit for Text Classification

翻译：文本分类的正正对匹配跟踪

Konstantinos Skianis,Nikolaos Tziortziotis,Michalis Vazirgiannis

In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and super-sparse models. Code and data are available at: https://www.dropbox.com/sh/7w7hjns71ol0xrz/AAC\string_G0\string_0DlcGkq6tQb2zqAaca\string?dl\string=0 .

翻译：在文本分类中,由于高维度,过度装配问题出现于文本分类中,使规范化变得必不可少。虽然典型的正规化者提供宽度,但他们未能返回非常准确的模型。相反,最先进的群集-lasso正规化者以低宽度为代价提供了更好的结果。在本文中,我们为文本分类任务采用了一种贪婪的变量选择算法,称为“正弦匹配追求”。我们还通过引入重叠的组 OMP 来扩展标准组 OMP, 处理重叠的特征组。经验分析证实 OMP 和重叠的 GOMP 构成强大的正规化者, 能够产生有效和超精细的模型。代码和数据可以在 https://www.droppox.com/sh/7w7jjnsol710xrz/ AACstring_G0\string_0DlcGkqq6tQ2zq\\qstring? dl\string=0 。

6

相关内容

文本分类

文本分类（Text Classification）任务是根据给定文档的内容或主题，自动分配预先定义的类别标签。

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

12+阅读 · 2020年2月23日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

专知会员服务

15+阅读 · 2019年10月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

读书报告 | Deep Learning for Extreme Multi-label Text Classification

读书报告 | Deep Learning for Extreme Multi-label Text Classification

科技创新与创业

48+阅读 · 2018年1月10日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking

SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking

Arxiv

3+阅读 · 2019年4月9日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Horizontal Pyramid Matching for Person Re-identification

Arxiv

3+阅读 · 2018年4月30日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

14+阅读 · 2018年4月6日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

MatchZoo: A Toolkit for Deep Text Matching

Arxiv

5+阅读 · 2017年7月23日

VIP会员

文章信息

相关主题

相关VIP内容

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

12+阅读 · 2020年2月23日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

Diganta Misra等人提出新激活函数Mish，在一些任务上超越RuLU

专知会员服务

15+阅读 · 2019年10月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

读书报告 | Deep Learning for Extreme Multi-label Text Classification

读书报告 | Deep Learning for Extreme Multi-label Text Classification

科技创新与创业

48+阅读 · 2018年1月10日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

A Sketch-Based System for Semantic Parsing

A Sketch-Based System for Semantic Parsing

Arxiv

4+阅读 · 2019年9月12日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking

SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking

Arxiv

3+阅读 · 2019年4月9日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Horizontal Pyramid Matching for Person Re-identification

Arxiv

3+阅读 · 2018年4月30日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

14+阅读 · 2018年4月6日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

MatchZoo: A Toolkit for Deep Text Matching

Arxiv

5+阅读 · 2017年7月23日

微信扫码咨询专知VIP会员