安全多试剂深强化学习,以联合投标和维护发电单位时间表安排 (Safe multi-agent deep reinforcement learning for joint bidding and maintenance scheduling of generation units) - 专知论文

会员服务 ·

0

强化学习 · 学成 · INFORMS · 评论员 · 优化器 ·

2021 年 12 月 20 日

Safe multi-agent deep reinforcement learning for joint bidding and maintenance scheduling of generation units

翻译：安全多试剂深强化学习,以联合投标和维护发电单位时间表安排

Pegah Rokhforoz,Olga Fink

This paper proposes a safe reinforcement learning algorithm for generation bidding decisions and unit maintenance scheduling in a competitive electricity market environment. In this problem, each unit aims to find a bidding strategy that maximizes its revenue while concurrently retaining its reliability by scheduling preventive maintenance. The maintenance scheduling provides some safety constraints which should be satisfied at all times. Satisfying the critical safety and reliability constraints while the generation units have an incomplete information of each others' bidding strategy is a challenging problem. Bi-level optimization and reinforcement learning are state of the art approaches for solving this type of problems. However, neither bi-level optimization nor reinforcement learning can handle the challenges of incomplete information and critical safety constraints. To tackle these challenges, we propose the safe deep deterministic policy gradient reinforcement learning algorithm which is based on a combination of reinforcement learning and a predicted safety filter. The case study demonstrates that the proposed approach can achieve a higher profit compared to other state of the art methods while concurrently satisfying the system safety constraints.

翻译：本文提出了在竞争性电力市场环境下为产生投标决定和单位维护时间安排制定安全强化学习算法,在这一问题下,每个单位都旨在寻找一个通过安排预防性维护既能最大限度地增加收入又同时保持可靠性的招标战略; 维护时间安排提供了一定的安全限制,这些限制在任何时候都应得到满足; 满足关键的安全和可靠性限制,而各生成单位对彼此的招标战略的信息不完全,这是一个具有挑战性的问题; 双级优化和加强学习是解决这类问题的最先进方法; 然而,无论是双级优化还是强化学习,都无法应对不完整信息和关键安全限制的挑战; 为了应对这些挑战,我们提议采用安全、深层的确定性政策梯度强化学习算法,该算法以强化学习与预测的安全过滤器相结合为基础; 案例研究表明,拟议的方法能够比其他工艺获得更高的利润,同时满足系统安全限制。

0

相关内容

强化学习

强化学习（RL）是机器学习的一个领域，与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外，强化学习是三种基本的机器学习范式之一。强化学习与监督学习的不同之处在于，不需要呈现带标签的输入/输出对，也不需要显式纠正次优动作。相反，重点是在探索（未知领域）和利用（当前知识）之间找到平衡。该环境通常以马尔可夫决策过程（MDP）的形式陈述，因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于，后者不假设MDP的确切数学模型，并且针对无法采用精确方法的大型MDP。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

专知会员服务

17+阅读 · 2019年11月10日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Arxiv

0+阅读 · 2022年2月17日

A Survey of Generalisation in Deep Reinforcement Learning

Arxiv

4+阅读 · 2021年11月18日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Arxiv

4+阅读 · 2018年12月31日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

【课程推荐】深度学习中的新兴挑战（Emerging Challenges in Deep Learning）

专知会员服务

17+阅读 · 2019年11月10日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

Arxiv

0+阅读 · 2022年2月17日

A Survey of Generalisation in Deep Reinforcement Learning

Arxiv

4+阅读 · 2021年11月18日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications

Arxiv

4+阅读 · 2018年12月31日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员