读书报告 | Deep Learning for Extreme Multi-label Text Classification - 专知

会员服务 ·

1

读书报告 | Deep Learning for Extreme Multi-label Text Classification

2018 年 1 月 10 日 科技创新与创业 朱纪乐

SIGIR 2017 FULL

链接：https://dl.acm.org/citation.cfm?id=3080834

一、背景介绍

1、研究背景：Multi-label和二分类、多分类研究的内容本身就不太一样，并且Multi-label的数据稀疏问题比单一分类更严重，因此很难学习label之间的依赖关系。

2、研究问题：Extreme Multi-label Text Classification（XMTC）研究的是在一个非常大的标签空间中，为每一个文档找到最相关的若干标签（例如Wikipedia）

3、相关工作：之前较为成熟的方法主要分为两大类（Target-Embedding和Tree-based Ensemble，如：SLEEC、FastXML、FastText、CNN-Kim、Bow-CNN、PD-Sparse），Deep Learning在文本分类上有一些工作，但没有考虑过XMTC问题

4、主要贡献：1）在6个数据集上实验对比了7个baseline；2）提出了XML-CNN，利用multi-label的共现性，对loss和网络结构进行优化；3）实验证明了模型在XMTC任务上的有效性

二、算法模型

1、基本框架：本文提出的模型其实是在CNN-Kim的基础上做的改进，从multi-class延伸到multi-label

2、模型细节：Pooling用的chunk-max pooling，Loss Function用的是cross entropy对于多标签的扩展，pooling layer和output layer之间加了一层全连接的隐层（文章中称之为Hidden Bottleneck Layer）

三、实验结果

1、数据集：6个benchmark，有不同的样本大小、标签数、文本长度

2、综合实验结果来看：XML-CNN能解决标签很多的时候产生的数据稀疏问题；特别设计的pooling、网络结构、loss设计起到了正向作用；训练时间也不算太慢

作者：朱纪乐，北京大学在读硕士，研究方向为教育数据挖掘、推荐系统

登录查看更多

48

相关内容

文本分类

文本分类（Text Classification）任务是根据给定文档的内容或主题，自动分配预先定义的类别标签。

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

171+阅读 · 2020年5月10日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【WWW2020-华为诺亚方舟论文】元学习推荐系统MetaSelector

【WWW2020-华为诺亚方舟论文】元学习推荐系统MetaSelector

专知会员服务

56+阅读 · 2020年2月10日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

92+阅读 · 2019年12月16日

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

专知会员服务

44+阅读 · 2019年11月20日

【报告推荐 | HEC-Montreal唐建博士】图神经网络推理，附27页ppt

【报告推荐 | HEC-Montreal唐建博士】图神经网络推理，附27页ppt

专知会员服务

78+阅读 · 2019年11月13日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART IV）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART IV）

AINLP

15+阅读 · 2019年8月26日

图卷积神经网络(GCN)文本分类详述

图卷积神经网络(GCN)文本分类详述

专知

280+阅读 · 2019年4月5日

博客 | 度量学习笔记(一) | Metric Learning for text categorization

博客 | 度量学习笔记(一) | Metric Learning for text categorization

AI研习社

21+阅读 · 2019年3月15日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Kaggle 恶意评论(toxic comment classification)分类 top 1 %方案

Kaggle 恶意评论(toxic comment classification)分类 top 1 %方案

AI研习社

11+阅读 · 2018年4月1日

基于深度学习的文本分类？

基于深度学习的文本分类？

数萃大数据

9+阅读 · 2018年3月4日

深度学习时代的推荐系统

深度学习时代的推荐系统

大数据技术

8+阅读 · 2018年1月6日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

fastText、TextCNN、TextRNN……这里有一套NLP文本分类深度学习方法库供你选择

fastText、TextCNN、TextRNN……这里有一套NLP文本分类深度学习方法库供你选择

机器人圈

4+阅读 · 2017年7月28日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

ML-Net: multi-label classification of biomedical texts with deep neural networks

ML-Net: multi-label classification of biomedical texts with deep neural networks

Arxiv

7+阅读 · 2018年11月15日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

Orthogonal Matching Pursuit for Text Classification

Arxiv

6+阅读 · 2018年7月12日

Learning Image Conditioned Label Space for Multilabel Classification

Arxiv

5+阅读 · 2018年2月21日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

Subset Labeled LDA for Large-Scale Multi-Label Classification

Arxiv

3+阅读 · 2017年9月16日

CNN-RNN: A Unified Framework for Multi-label Image Classification

Arxiv

7+阅读 · 2016年4月15日

VIP会员

相关主题

相关VIP内容

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

171+阅读 · 2020年5月10日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

【WWW2020-华为诺亚方舟论文】元学习推荐系统MetaSelector

【WWW2020-华为诺亚方舟论文】元学习推荐系统MetaSelector

专知会员服务

56+阅读 · 2020年2月10日

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

【AAAI2020-清华大学】张量图卷积网络文本分类，Tensor Graph Convolutional Networks for Text Classification

专知会员服务

76+阅读 · 2020年1月16日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

92+阅读 · 2019年12月16日

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

【WSDM 2020 论文】网络嵌入的初始化：一种图划分方法（Initialization for Network Embedding: A Graph Partition Approach）

专知会员服务

44+阅读 · 2019年11月20日

【报告推荐 | HEC-Montreal唐建博士】图神经网络推理，附27页ppt

【报告推荐 | HEC-Montreal唐建博士】图神经网络推理，附27页ppt

专知会员服务

78+阅读 · 2019年11月13日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART V）

AINLP

38+阅读 · 2019年9月3日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART IV）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART IV）

AINLP

15+阅读 · 2019年8月26日

图卷积神经网络(GCN)文本分类详述

图卷积神经网络(GCN)文本分类详述

专知

280+阅读 · 2019年4月5日

博客 | 度量学习笔记(一) | Metric Learning for text categorization

博客 | 度量学习笔记(一) | Metric Learning for text categorization

AI研习社

21+阅读 · 2019年3月15日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Kaggle 恶意评论(toxic comment classification)分类 top 1 %方案

Kaggle 恶意评论(toxic comment classification)分类 top 1 %方案

AI研习社

11+阅读 · 2018年4月1日

基于深度学习的文本分类？

基于深度学习的文本分类？

数萃大数据

9+阅读 · 2018年3月4日

深度学习时代的推荐系统

深度学习时代的推荐系统

大数据技术

8+阅读 · 2018年1月6日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

fastText、TextCNN、TextRNN……这里有一套NLP文本分类深度学习方法库供你选择

fastText、TextCNN、TextRNN……这里有一套NLP文本分类深度学习方法库供你选择

机器人圈

4+阅读 · 2017年7月28日

相关论文

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

X-BERT: eXtreme Multi-label Text Classification with BERT

X-BERT: eXtreme Multi-label Text Classification with BERT

Arxiv

12+阅读 · 2019年7月4日

ML-Net: multi-label classification of biomedical texts with deep neural networks

ML-Net: multi-label classification of biomedical texts with deep neural networks

Arxiv

7+阅读 · 2018年11月15日

Graph Convolutional Networks for Text Classification

Arxiv

12+阅读 · 2018年9月15日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

Orthogonal Matching Pursuit for Text Classification

Arxiv

6+阅读 · 2018年7月12日

Learning Image Conditioned Label Space for Multilabel Classification

Arxiv

5+阅读 · 2018年2月21日

Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly

Arxiv

18+阅读 · 2018年1月15日

Subset Labeled LDA for Large-Scale Multi-Label Classification

Arxiv

3+阅读 · 2017年9月16日

CNN-RNN: A Unified Framework for Multi-label Image Classification

Arxiv

7+阅读 · 2016年4月15日

大家都在搜

蓝牙安全攻防

大型语言模型

朱克爱德华兹家族

【论文笔记】用于数据驱动交通预测的扩散卷积循环神经网络（DCRNN）

微信扫码咨询专知VIP会员