强化学习(RL)是机器学习的一个领域,与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外,强化学习是三种基本的机器学习范式之一。 强化学习与监督学习的不同之处在于,不需要呈现带标签的输入/输出对,也不需要显式纠正次优动作。相反,重点是在探索(未知领域)和利用(当前知识)之间找到平衡。 该环境通常以马尔可夫决策过程(MDP)的形式陈述,因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于,后者不假设MDP的确切数学模型,并且针对无法采用精确方法的大型MDP。

知识荟萃

强化学习 ( Reinforcement learning ) 专知荟萃

入门学习

综述

进阶论文

  1. Rasim M Alguliev, Ramiz M Aliguliyev, Makrufa S Hajirahimova, and Chingiz A Mehdiyev. 2011. MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 12 (2011), 14514–14522. [http://www.sciencedirect.com/science/article/pii/S0957417411008177]
  2. Rasim M Alguliev, Ramiz M Aliguliyev, and Nijat R Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40, 5 (2013), 1675–1689. [http://www.sciencedirect.com/science/article/pii/S0957417412010688]
  3. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. ArXiv e-prints (2017). arXiv:1707.02919 [https://arxiv.org/abs/1707.02919]
  4. Einat Amitay and Cécile Paris. 2000. Automatically summarising web sites: is there a way around it?. In Proceedings of the ninth international conference on Information and knowledge management. ACM, 173–179. [https://dl.acm.org/citation.cfm?id=354756.354816]
  5. Elena Baralis, Luca Cagliero, Saima Jabeen, Alessandro Fiori, and Sajid Shah. 2013. Multi-document summarization based on the Yago ontology. Expert Systems with Applications 40, 17 (2013), 6976–6984. [http://www.sciencedirect.com/science/article/pii/S0957417413004429]
  6. Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 481–490. [https://dl.acm.org/citation.cfm?id=2002534&amp%3bpreflayout=flat]
  7. Asli Celikyilmaz and Dilek Hakkani-Tur. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 815–824. [https://dl.acm.org/citation.cfm?id=1858765]
  8. Ping Chen and Rakesh Verma. 2006. A query-based medical information summarization system using ontology knowledge. In Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on. IEEE, 37–42. [https://dl.acm.org/citation.cfm?id=1153019]
  9. Freddy Chong Tat Chua and Sitaram Asur. 2013. Automatic Summarization of Events from Social Media.. In ICWSM. [https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6057/0]
  10. John M Conroy and Dianne P O’leary. 2001. Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 406–407. [http://pdfs.semanticscholar.org/1213/3cfc6688cc2cdea57595b045a28b94d98f1d.pdf]
  11. Hal Daumé III and Daniel Marcu. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 305–312. [https://dl.acm.org/citation.cfm?id=1220214]
  12. J-Y Delort, Bernadette Bouchon-Meunier, and Maria Rifqi. 2003. Enhanced web document summarization using hyperlinks. In Proceedings of the fourteenth ACM conference on Hypertext and hypermedia. ACM, 208–215. [http://dl.acm.org/citation.cfm?id=900097]
  13. Günes Erkan and Dragomir R Radev. 2004. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res.(JAIR) 22, 1 (2004), 457–479. [https://arxiv.org/abs/1109.2128]
  14. Yihong Gong and Xin Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19–25. [https://dl.acm.org/citation.cfm?doid=383952.383955]
  15. Vishal Gupta and Gurpreet Singh Lehal. 2010. A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2, 3 (2010), 258–268. [http://www.learnpunjabi.org/pdf/survey-paper.pdf]
  16. Ben Hachey, Gabriel Murray, and David Reitter. 2006. Dimensionality reduction aids term co-occurrence based multi-document summarization.In Proceedings of arXiv, July 2017, USA Allahyari, M. et al the workshop on task-focused summarization and question answering. Association for Computational Linguistics, 1–7. [http://www.ltg.ed.ac.uk/np/publications/ltg/papers/Hachey2006Dimensionality.pdf]
  17. John Hannon, Kevin McCarthy, James Lynch, and Barry Smyth. 2011. Personalized and automatic social summarization of events in video. In Proceedings of the 16th international conference on Intelligent user interfaces. ACM, 335–338. [https://dl.acm.org/citation.cfm?id=1943459]
  18. Sanda Harabagiu and Finley Lacatusu. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 202–209. [https://dl.acm.org/citation.cfm?id=1076071]
  19. Leonhard Hennig, Winfried Umbrath, and Robert Wetzker. 2008. An ontologybased approach to text summarization. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM International Conference on, Vol. 3. IEEE, 291–294. [http://dl.acm.org/citation.cfm?id=1487345]
  20. Meishan Hu, Aixin Sun, and Ee-Peng Lim. 2007. Comments-oriented blog summarization by sentence extraction. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 901–904. [https://dl.acm.org/citation.cfm?id=1321571&CFID=824361189&CFTOKEN=11022411]
  21. Meishan Hu, Aixin Sun, and Ee-Peng Lim. 2008. Comments-oriented document summarization: understanding documents with readers’ feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 291–298. [https://dl.acm.org/citation.cfm?id=1390385&CFID=824361189&CFTOKEN=11022411]
  22. Elena Lloret and Manuel Palomar. 2012. Text summarisation in progress: a literature review. Artificial Intelligence Review 37, 1 (2012), 1–41. [https://link.springer.com/article/10.1007%2Fs10462-011-9216-z]
  23. Hans Peter Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of research and development 2, 2 (1958), 159–165. [39] Inderjeet Mani and Eric Bloedorn. 1999. Summarizing similarities and differences among related documents. Information Retrieval 1, 1-2 (1999), 35–67. [http://www.di.ubi.pt/~jpaulo/competence/general/(1958)Luhn.pdf]
  24. Inderjeet Mani, Gary Klein, David House, Lynette Hirschman, Therese Firmin, and Beth Sundheim. 2002. SUMMAC: a text summarization evaluation. Natural Language Engineering 8, 01 (2002), 43–68.
  25. Qiaozhu Mei and ChengXiang Zhai. 2008. Generating Impact-Based Summaries for Scientific Literature.. In ACL, Vol. 8. Citeseer, 816–824. [https://www.researchgate.net/publication/231901086_SUMMAC_a_text_summarization_evaluation]
  26. Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. Association for Computational Linguistics. [https://digital.library.unt.edu/ark:/67531/metadc30962/]
  27. Rada Mihalcea and Paul Tarau. 2005. A language independent algorithm for single and multiple document summarization. (2005). [https://www.researchgate.net/publication/228340005_A_language_independent_algorithm_for_single_and_multiple_document_summarization]
  28. Liu Na, Li Ming-xia, Lu Ying, Tang Xiao-jun, Wang Hai-wen, and Xiao Peng. 2014. Mixture of topic model for multi-document summarization. In Control and Decision Conference (2014 CCDC), The 26th Chinese. IEEE, 5168–5172. [http://ieeexplore.ieee.org/document/6853102/metrics]
  29. Ani Nenkova and Amit Bagga. 2004. Facilitating email thread access by extractive summary generation. Recent advances in natural language processing III: selected papers from RANLP 2003 (2004), 287. [https://www.researchgate.net/publication/221303547_Facilitating_email_thread_access_by_extractive_summary_generation]
  30. Ani Nenkova and Kathleen McKeown. 2012. A survey of text summarization techniques. In Mining Text Data. Springer, 43–76 [https://www.mendeley.com/research-papers/survey-text-summarization-techniques/]
  31. Paula S Newman and John C Blitzer. 2003. Summarizing archived discussions: a beginning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 273–276. [https://dl.acm.org/citation.cfm?id=604097]
  32. You Ouyang, Wenjie Li, Sujian Li, and Qin Lu. 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47, 2 (2011), 227–237. [http://www.sciencedirect.com/science/article/pii/S0306457310000257]
  33. Makbule Gulcin Ozsoy, Ilyas Cicekli, and Ferda Nur Alpaslan. 2010. Text summarization of turkish texts using latent semantic analysis. In Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, 869–876. [https://dl.acm.org/citation.cfm?id=1873879]
  34. Vahed Qazvinian and Dragomir R Radev. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 689–696. [https://dl.acm.org/citation.cfm?id=1599081.1599168]
  35. Vahed Qazvinian, Dragomir R Radev, Saif M Mohammad, Bonnie Dorr, David Zajic, Michael Whidby, and Taesun Moon. 2014. Generating extractive summaries of scientific paradigms. arXiv preprint arXiv:1402.0556 (2014). [https://www.researchgate.net/publication/229534087_Generating_surveys_of_scientific_paradigms]
  36. Dragomir R Radev, Eduard Hovy, and Kathleen McKeown. 2002. Introduction to the special issue on summarization. Computational linguistics 28, 4 (2002), 399–408. [https://dl.acm.org/citation.cfm?id=638178.638179]
  37. Dragomir R Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroidbased summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. Association for Computational Linguistics, 21– 30. [http://www.docin.com/p-853652484.html]
  38. Dragomir R Radev, Hongyan Jing, Małgorzata Styś, and Daniel Tam. 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40, 6 (2004), 919–938. [http://www.sciencedirect.com/science/article/pii/S0306457303000955]
  39. Owen Rambow, Lokesh Shrestha, John Chen, and Chirsty Lauridsen. 2004. Summarizing email threads. In Proceedings of HLT-NAACL 2004: Short Papers. Association for Computational Linguistics, 105–108. [https://dl.acm.org/citation.cfm?id=1614011]
  40. Zhaochun Ren, Shangsong Liang, Edgar Meij, and Maarten de Rijke. 2013. Personalized time-aware tweets summarization. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 513–522. [https://staff.fnwi.uva.nl/m.derijke/wp-content/papercite-data/pdf/ren-personalized-2013.pdf]
  41. Horacio Saggion and Thierry Poibeau. 2013. Automatic text summarization: Past, present and future. In Multi-source, Multilingual Information Extraction and Summarization. Springer, 3–21. [https://hal.archives-ouvertes.fr/hal-00782442/document]
  42. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513– 523. [http://www.sciencedirect.com/science/article/pii/0306457388900210]
  43. Yogesh Sankarasubramaniam, Krishnan Ramanathan, and Subhankar Ghosh. 2014. Text summarization using Wikipedia. Information Processing & Management 50, 3 (2014), 443–461. [http://www.sciencedirect.com/science/article/pii/S0306457314000119]
  44. Beaux P Sharifi, David I Inouye, and Jugal K Kalita. 2013. Summarization of Twitter Microblogs. Comput. J. (2013), bxt109. [http://cs.uccs.edu/~jkalita/papers/2013/SharifiBeauxComputerJournal2013.pdf]
  45. E. D. Trippe, J. B. Aguilar, Y. H. Yan, M. V. Nural, J. A. Brady, M. Assefi, S. Safaei, M. Allahyari, S. Pouriyeh, M. R. Galinski, J. C. Kissinger, and J. B. Gutierrez. 2017. A Vision for Health Informatics: Introducing the SKED Framework.An Extensible Architecture for Scientific Knowledge Extraction from Data. ArXiv e-prints (2017). arXiv:1706.07992 [https://arxiv.org/abs/1706.07992]
  46. Neural Summarization by Extracting Sentences and Words [https://arxiv.org/pdf/1603.07252.pdf]
  47. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond [https://arxiv.org/pdf/1602.06023.pdf]
  48. A Neural Attention Model for Abstractive Sentence Summarization [https://arxiv.org/pdf/1509.00685.pdf]
  49. A Deep Reinforced Model for Abstractive Summarization [https://arxiv.org/pdf/1705.04304.pdf]
  50. Text summarization using Latent Semantic Analysis [https://www.researchgate.net/publication/220195824_Text_summarization_using_Latent_Semantic_Analysis]
  51. TextRank: Bringing Order into Texts https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf~
  52. Sentence Extraction Based Single Document Summarization [http://oldwww.iiit.ac.in/cgi-bin/techreports/display_detail.cgi?id=IIIT/TR/2008/97]

代码

  1. Sequence-to-Sequence with Attention Model for Text Summarization.
    [https://github.com/tensorflow/models/tree/master/research/textsum]
  2. gensim.summarization offers TextRank summarization
    https://radimrehurek.com/gensim/summarization/summariser.html

Tutorial

  1. 文本自动摘要:现状与未来 万小军 北京大学 2016年10月16日 [https://pan.baidu.com/s/1nuTUrSP]
  2. Tutorial on automatic summarization [https://www.slideshare.net/dinel/orasan-ranlp2009] [https://pan.baidu.com/s/1o8bZJJk]
  3. How to Run Text Summarization with TensorFlow [https://hackernoon.com/how-to-run-text-summarization-with-tensorflow-d4472587602d]
  4. Text Summarization with Gensim [https://rare-technologies.com/text-summarization-with-gensim/]

数据集

  1. DUC 2004 [http://www.cis.upenn.edu/~nlp/corpora/sumrepo.html]
  2. Opinosis Dataset - Topic related review sentences [http://kavita-ganesan.com/opinosis-opinion-dataset]
  3. 17 Timelines [http://kavita-ganesan.com/opinosis-opinion-dataset]
  4. Legal Case Reports Data Set [http://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports]

领域专家

  1. 万小军 清华大学 [https://sites.google.com/site/wanxiaojun1979/]
  2. 秦兵 哈工大 [https://m.weibo.cn/u/1880324342?sudaref=login.sina.com.cn&retcode=6102]
  3. 刘挺 [http://homepage.hit.edu.cn/pages/liuting]

VIP内容

摘要

推荐系统已经被广泛应用于不同的现实生活场景,帮助我们找到有用的信息。近年来,基于强化学习(RL)的推荐系统已经成为一个新兴的研究课题。由于其交互性和自主学习能力,它常常超过传统的推荐模型,甚至是最基于深度学习的方法。然而,在推荐系统中应用RL还面临着各种挑战。为此,我们首先对五种典型推荐场景的RL方法进行了全面的概述、比较和总结,以下是三个主要的RL类别: 价值函数、策略搜索和演员-评论员(Actor-Critic)。然后,在现有文献的基础上,系统分析了面临的挑战和相应的解决方案。最后,通过对RL研究中存在的问题和局限性的讨论,指出了该领域潜在的研究方向。

https://arxiv.org/abs/2109.10665

引言

个性化推荐系统能够提供符合用户喜好的有趣信息,从而有助于缓解信息过载问题。在过去的二十年中,人们对推荐系统进行了广泛的研究,开发了许多推荐方法。这些方法通常根据用户的喜好、商品特征和用户与商品的交互来进行个性化的推荐。一些推荐方法还利用其他附加信息,如用户之间的社会关系(例如,社会推荐)、时间数据(例如,顺序推荐)和位置感知信息(例如,POI(“兴趣点”的缩写)推荐。

推荐技术通常利用各种信息为用户提供潜在的项目。在现实场景中,推荐系统根据用户与商品的交互历史进行商品推荐,然后接收用户反馈进行进一步推荐。也就是说,推荐系统的目的是通过交互获取用户的偏好,并推荐用户可能感兴趣的项目。为此,早期的推荐研究主要集中在开发基于内容和基于协同过滤的方法[2],[3]。矩阵分解是传统推荐方法中最具代表性的方法之一。近年来,由于深度学习的快速发展,各种神经推荐方法被开发出来[4]。然而,现有的推荐方法往往忽略了用户与推荐模型之间的交互。它们不能有效地捕捉到用户的及时反馈来更新推荐模型,往往导致推荐结果不理想。

一般来说,推荐任务可以建模为这样一个交互过程——用户被推荐一个商品,然后为推荐模型提供反馈(例如,跳过、点击或购买)。在下一次交互中,推荐模型从用户的显式/隐式反馈中学习,并向用户推荐一个新项目。从用户的角度来看,高效的交互意味着帮助用户尽快找到准确的商品。从模型的角度看,有必要在推荐的多轮中平衡新颖性、相关性和多样性。交互式推荐方法已成功应用于现实世界的推荐任务中。然而,该方法经常遇到一些问题,如冷启动[5]和数据稀疏[6],以及挑战,如可解释性[7]和安全性[8]。

作为一个机器学习领域,强化学习(RL)专注于智能代理如何与环境交互,提供了潜在的解决方案来模拟用户和代理之间的交互。最近RL的成功推动了人工智能[9],[10]的研究。特别是,深度强化学习(DRL)[11]具有强大的表示学习和函数逼近特性,可以解决人工智能的挑战。它已被应用于各个领域,如游戏[12],机器人[13],网络[14]。近年来,应用RL解决推荐问题已成为推荐研究的一个新趋势。具体来说,RL使推荐代理能够不断地与环境(例如,用户和/或记录的数据)交互,以学习最佳推荐策略。在实践中,基于RL的推荐系统已经被应用到许多特定的场景中,如电子商务[18]、电子学习[19]、电影推荐[20]、音乐推荐[21]、新闻推荐[22]、工作技能推荐[23]、医疗保健[24]、能量优化[25]等。

为促进基于RL的推荐系统的研究,本文总结了现有的推荐问题的相关解决方案,系统分析了在推荐方法中应用RL所面临的挑战,并探讨了未来潜在的研究方向。本文从理论研究的角度,回顾了已有的研究工作,包括环境构建、先验知识、奖励函数定义、学习偏差和任务构建。环境建设可以缓解勘探开发的取舍。先验知识和奖励定义是进行推荐决策的关键。此外,任务结构化可以很好地解决维度的诅咒。从应用的角度,我们还提供了基于RL的推荐系统的全面调研,分别遵循价值函数、策略搜索和演员评论。值得注意[26]的是还提供了对基于RL和drl的推荐算法的回顾,并在推荐列表、架构、可解释性和评估方面提出了几个研究方向。[27]主要从基于模型的方法和无模型的算法两方面对基于drl的推荐系统进行了概述,并重点介绍了基于drl的推荐中一些有待解决的问题和新兴的课题。与[26]和[27]不同的是,我们根据其他分类算法(即价值函数、策略搜索和角色-评论)概述了现有的(D)RL推荐方法,并分析了在推荐系统中应用(D)RL的挑战。

本工作的主要贡献如下:

  • 我们全面回顾了为五种典型推荐方案开发的RL方法。对于每个推荐场景,我们提供了有代表性的模型的详细描述,总结了文献中使用的具体RL算法,并进行了必要的比较。

  • 我们系统地分析了在推荐系统中应用RL所面临的挑战,包括环境构建、先验知识、奖励函数定义、学习偏差和任务构建。

  • 我们还讨论了RL的开放问题,分析了该领域的实际挑战,并提出了未来可能的研究和应用方向。

本文的其余部分结构如下。第2节介绍了RL的背景,定义了相关的概念,列出了常用的方法。第三节给出了基于rl的推荐方法的标准定义。第4节全面回顾了为推荐系统开发的RL算法。第五部分讨论了在推荐系统中应用RL所面临的挑战和相应的解决方案。接下来,第6节讨论了基于rl的推荐系统的各种限制和潜在的研究方向。最后,第7节总结了本研究。

成为VIP会员查看完整内容
0
16

最新论文

Point cloud registration plays a critical role in a multitude of computer vision tasks, such as pose estimation and 3D localization. Recently, a plethora of deep learning methods were formulated that aim to tackle this problem. Most of these approaches find point or feature correspondences, from which the transformations are computed. We give a different perspective and frame the registration problem as a Markov Decision Process. Instead of directly searching for the transformation, the problem becomes one of finding a sequence of translation and rotation actions that is equivalent to this transformation. To this end, we propose an artificial agent trained end-to-end using deep supervised learning. In contrast to conventional reinforcement learning techniques, the observations are sampled i.i.d. and thus no experience replay buffer is required, resulting in a more streamlined training process. Experiments on ModelNet40 show results comparable or superior to the state of the art in the case of clean, noisy and partially visible datasets.

0
0
下载
预览
Top