面向结构的无监督社会媒体网站排网战略 (A Structure-Oriented Unsupervised Crawling Strategy for Social Media Sites) - 专知论文

会员服务 ·

0

无监督 · Sitemap · Better · 学成 · 无监督学习 ·

2018 年 4 月 8 日

A Structure-Oriented Unsupervised Crawling Strategy for Social Media Sites

翻译：面向结构的无监督社会媒体网站排网战略

Keyang Xu,Kyle Yingkai Gao,Jamie Callan

Existing techniques for efficiently crawling social media sites rely on URL patterns, query logs, and human supervision. This paper describes SOUrCe, a structure-oriented unsupervised crawler that uses page structures to learn how to crawl a social media site efficiently. SOUrCe consists of two stages. During its unsupervised learning phase, SOUrCe constructs a sitemap that clusters pages based on their structural similarity and generates a navigation table that describes how the different types of pages in the site are linked together. During its harvesting phase, it uses the navigation table and a crawling policy to guide the choice of which links to crawl next. Experiments show that this architecture supports different styles of crawling efficiently, and does a better job of staying focused on user-created contents than baseline methods.

翻译：高效爬行社交媒体网站的现有技术依赖于 URL 模式、查询日志和人文监督。本文描述SOUrCe, 是一个结构导向型、不受监督的爬行者, 使用页面结构来学习如何有效爬行社交媒体网站。 SOUrCe 由两个阶段组成。在未受监督的学习阶段, SOUrCe 构建了一个基于结构相似性的集成页面的站点图, 并生成一个导航表, 描述网站不同类型页面是如何连接在一起的。在采集阶段, 它使用导航表和爬行政策来指导选择下一个链接。实验显示, 这一架构支持了不同爬行风格, 并且比基线方法更好地关注用户创建的内容。

0

相关内容

无监督

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

150+阅读 · 2020年7月6日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

专知会员服务

54+阅读 · 2019年12月27日

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

专知会员服务

48+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

人工智能 | PRICAI 2019等国际会议信息9条

人工智能 | PRICAI 2019等国际会议信息9条

Call4Papers

6+阅读 · 2018年12月13日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

How Useful is Self-Supervised Pretraining for Visual Tasks?

How Useful is Self-Supervised Pretraining for Visual Tasks?

Arxiv

9+阅读 · 2020年3月31日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

The Measure of Intelligence

The Measure of Intelligence

Arxiv

8+阅读 · 2019年11月5日

Learning to Predict the Cosmological Structure Formation

Arxiv

3+阅读 · 2018年11月15日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Learning Unsupervised Learning Rules

Arxiv

7+阅读 · 2018年5月23日

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Arxiv

6+阅读 · 2018年4月20日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

无监督学习

相关VIP内容

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

150+阅读 · 2020年7月6日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

【华盛顿大学】知识建模+生成式推理，60页ppt，Cracking Commonsense Intelligence with Knowledge Modeling + Generative Reasoning

专知会员服务

54+阅读 · 2019年12月27日

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

【中科院计算所 | 文献综述】自然语言生成的无监督前训练:文献综述，Unsupervised Pre-training for Natural Language Generation: A Literature Review

专知会员服务

48+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军五大转型方向

一种Agent自主性风险评估框架 | 最新文献

实时无人机指令处理：一种面向无人机系统的大语言模型方法

基于动态知识图谱的人工智能代理自主研究周期 | 文献

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人工智能 | 国际会议信息10条

人工智能 | 国际会议信息10条

Call4Papers

5+阅读 · 2018年12月18日

人工智能 | PRICAI 2019等国际会议信息9条

人工智能 | PRICAI 2019等国际会议信息9条

Call4Papers

6+阅读 · 2018年12月13日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

How Useful is Self-Supervised Pretraining for Visual Tasks?

How Useful is Self-Supervised Pretraining for Visual Tasks?

Arxiv

9+阅读 · 2020年3月31日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

The Measure of Intelligence

The Measure of Intelligence

Arxiv

8+阅读 · 2019年11月5日

Learning to Predict the Cosmological Structure Formation

Arxiv

3+阅读 · 2018年11月15日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Learning Unsupervised Learning Rules

Arxiv

7+阅读 · 2018年5月23日

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Arxiv

6+阅读 · 2018年4月20日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员