This paper contributes to the existing literature on hedging American options with Deep Reinforcement Learning (DRL). The study first investigates hyperparameter impact on hedging performance, considering learning rates, training episodes, neural network architectures, training steps, and transaction cost penalty functions. Results highlight the importance of avoiding certain combinations, such as high learning rates with a high number of training episodes or low learning rates with few training episodes and emphasize the significance of utilizing moderate values for optimal outcomes. Additionally, the paper warns against excessive training steps to prevent instability and demonstrates the superiority of a quadratic transaction cost penalty function over a linear version. This study then expands upon the work of Pickard et al. (2024), who utilize a Chebyshev interpolation option pricing method to train DRL agents with market calibrated stochastic volatility models. While the results of Pickard et al. (2024) showed that these DRL agents achieve satisfactory performance on empirical asset paths, this study introduces a novel approach where new agents at weekly intervals to newly calibrated stochastic volatility models. Results show DRL agents re-trained using weekly market data surpass the performance of those trained solely on the sale date. Furthermore, the paper demonstrates that both single-train and weekly-train DRL agents outperform the Black-Scholes Delta method at transaction costs of 1% and 3%. This practical relevance suggests that practitioners can leverage readily available market data to train DRL agents for effective hedging of options in their portfolios.
翻译:暂无翻译