We consider the problem of optimal charging/discharging of a bank of heterogenous battery units, driven by stochastic electricity generation and demand processes. The batteries in the battery bank may differ with respect to their capacities, ramp constraints, losses, as well as cycling costs. The goal is to minimize the degradation costs associated with battery cycling in the long run; this is posed formally as a Markov decision process. We propose a linear function approximation based Q-learning algorithm for learning the optimal solution, using a specially designed class of kernel functions that approximate the structure of the value functions associated with the MDP. The proposed algorithm is validated via an extensive case study.
翻译:电池库中的电池在能力、坡道限制、损耗和自行车成本方面可能有所不同,目的是尽可能降低电池循环的降解成本;从长远来看,这是正式的Markov决策程序。我们提出了一个基于线性函数的Q-学习算法,用于学习最佳解决方案,使用专门设计的内核功能类别,接近与MDP相关的价值功能结构。提议的算法通过广泛的案例研究得到验证。