Demand response (DR), as one of the important energy resources in the future's grid, provides the services of peak shaving, enhancing the efficiency of renewable energy utilization with a short response period, and low cost. Various categories of DR are established, e.g. automated DR, incentive DR, emergency DR, and demand bidding. However, with the practical issue of the unawareness of residential and commercial consumers' utility models, the researches about demand bidding aggregator involved in the electricity market are just at the beginning stage. For this issue, the bidding price and bidding quantity are two required decision variables while considering the uncertainties due to the market and participants. In this paper, we determine the bidding and purchasing strategy simultaneously employing the smart meter data and functions. A two-agent deep deterministic policy gradient method is developed to optimize the decisions through learning historical bidding experiences. The online learning further utilizes the daily newest bidding experience attained to ensure trend tracing and self-adaptation. Two environment simulators are adopted for testifying the robustness of the model. The results prove that when facing diverse situations the proposed model can earn the optimal profit via off/online learning the bidding rules and robustly making the proper bid.
翻译:需求响应是未来电网中重要的能源资源之一,它提供峰值剃发服务,提高可再生能源使用效率,短期反应期和低成本;建立各类DR,例如自动DR、奖励DR、紧急DR、紧急DR和需求招标;然而,由于住宅和商业消费者公用事业模式缺乏认识这一实际问题,电力市场中涉及的需求招标聚合器的研究才刚刚开始;对于这一问题,招标价格和投标数量是两个必要的决定变量,同时考虑市场和参与者的不确定性;在本文件中,我们同时确定招标和采购战略,同时使用智能计量数据和功能;开发了一种两家机构深度确定性的政策梯度方法,通过学习历史投标经验优化决策;在线学习进一步利用每日最新投标经验,确保趋势追踪和自我适应;采用两种环境模拟器来证明模型的稳健性;结果证明,在面临不同情况时,拟议模式能够通过脱机/线招标规则获得最佳利润;适当招标规则,通过适当招标获得最佳利润。