Text2Time: 基于Transformer的文章发布时间预测器 (Text2Time: Transformer-based article time period predictor) - 专知论文

会员服务 ·

0

朴素贝叶斯 · 新闻 · Transformer · 贝叶斯 · BERT ·

2023 年 4 月 21 日

Text2Time: Transformer-based article time period predictor

翻译：Text2Time: 基于Transformer的文章发布时间预测器

Karthick Prasad Gunasekaran,B Chase Babrich,Saurabh Shirodkar,Hee Hwang

from arxiv, 8 Pages

We explore the problem of predicting the publication period of text document, such as a news article, using the text from that document. In order to do so, we created our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades. We then provide an implementation of a simple Naive Bayes baseline model, which surprisingly achieves decent performance in terms of accuracy.Finally, for our approach, we use a pretrained BERT model fine-tuned for the task of text classification. This model exceeds our expectations and provides some very impressive results in terms of accurately classifying news articles into their respective publication decades. The results beat the performance of the few previously tried models for this relatively unexplored task of time prediction from text.

翻译：我们研究使用文章文本来预测发布时间的问题，例如新闻文章。为此，我们创建了自己的大型标签数据集，其中包括《纽约时报》六十年来发布的超过350,000篇文章。接着，我们提供了一个简单的朴素贝叶斯基准模型的实现，令人惊讶的是它在准确性方面表现出色。最后，我们使用一个预训练的BERT模型进行微调来实现我们的方法，这个模型超出了我们的预期，并在准确地将新闻文章分类到它们各自的出版年代方面提供了一些非常惊人的结果。结果超过了先前为这一相对未被探索的文本时间预测任务尝试的少数模型在准确性上的表现。

0

相关内容

朴素贝叶斯

朴素贝叶斯

朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法。对于给定的训练数据集，首先基于“特征条件独立”的假设学习输入/输出的联合概率分布。然后基于此模型，对给定输入x，利用贝叶斯定理求后验概率最大的y。朴素贝叶斯实现简单，学习与预测的效率都很高，是一种常用的方法。

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

专知会员服务

109+阅读 · 2020年12月19日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

Yann Lecun 纽约大学《深度学习(PyTorch)》课程(2020）PPT

Yann Lecun 纽约大学《深度学习(PyTorch)》课程(2020）PPT

专知会员服务

183+阅读 · 2020年3月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

全蛋白质水凝胶机械性能的调控及在生物医学中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

SRSF10调控的选择性剪接在脂肪细胞分化中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

具有微孔、介孔和大孔的多级孔MOF材料的设计合成与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

PCOS患者卵巢颗粒细胞对卵子及早期胚胎发育潜能的基因调控

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪细胞分化的分子调控网络的系统生物学研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder

Arxiv

0+阅读 · 2023年6月6日

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

Arxiv

0+阅读 · 2023年6月5日

Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems

Arxiv

0+阅读 · 2023年6月2日

Model-Free Error Assessment for Breadth-First Studies, with Applications to Cell-Perturbation Experiments

Arxiv

0+阅读 · 2023年6月2日

Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials

Arxiv

0+阅读 · 2023年6月2日

Sequential change-point detection: Computation versus statistical performance

Arxiv

0+阅读 · 2023年6月1日

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Arxiv

20+阅读 · 2023年2月1日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Arxiv

11+阅读 · 2018年6月16日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

VIP会员

文章信息

相关主题

朴素贝叶斯

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

专知会员服务

109+阅读 · 2020年12月19日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

Yann Lecun 纽约大学《深度学习(PyTorch)》课程(2020）PPT

Yann Lecun 纽约大学《深度学习(PyTorch)》课程(2020）PPT

专知会员服务

183+阅读 · 2020年3月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder

Arxiv

0+阅读 · 2023年6月6日

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

Arxiv

0+阅读 · 2023年6月5日

Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems

Arxiv

0+阅读 · 2023年6月2日

Model-Free Error Assessment for Breadth-First Studies, with Applications to Cell-Perturbation Experiments

Arxiv

0+阅读 · 2023年6月2日

Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials

Arxiv

0+阅读 · 2023年6月2日

Sequential change-point detection: Computation versus statistical performance

Arxiv

0+阅读 · 2023年6月1日

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Arxiv

20+阅读 · 2023年2月1日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Arxiv

11+阅读 · 2018年6月16日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

相关基金

全蛋白质水凝胶机械性能的调控及在生物医学中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

SRSF10调控的选择性剪接在脂肪细胞分化中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

具有微孔、介孔和大孔的多级孔MOF材料的设计合成与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

PCOS患者卵巢颗粒细胞对卵子及早期胚胎发育潜能的基因调控

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪细胞分化的分子调控网络的系统生物学研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员