This paper shows how reinforcement learning can be used to derive optimal hedging strategies for derivatives when there are transaction costs. The paper illustrates the approach by showing the difference between using delta hedging and optimal hedging for a short position in a call option when the objective is to minimize a function equal to the mean hedging cost plus a constant times the standard deviation of the hedging cost. Two situations are considered. In the first, the asset price follows a geometric Brownian motion. In the second, the asset price follows a stochastic volatility process. The paper extends the basic reinforcement learning approach in a number of ways. First, it uses two different Q-functions so that both the expected value of the cost and the expected value of the square of the cost are tracked for different state/action combinations. This approach increases the range of objective functions that can be used. Second, it uses a learning algorithm that allows for continuous state and action space. Third, it compares the accounting P&L approach (where the hedged position is valued at each step) and the cash flow approach (where cash inflows and outflows are used). We find that a hybrid approach involving the use of an accounting P&L approach that incorporates a relatively simple valuation model works well. The valuation model does not have to correspond to the process assumed for the underlying asset price.
翻译:本文展示了如何利用强化学习来在交易成本出现时为衍生物制定最佳套期保值战略的方法。本文件展示了方法,展示了使用三角洲套期保值与在调用选项中短职位最佳套期保值之间的差别,因为目标是最大限度地减少一个与平均套期保值成本相等的功能,加上对套期保值的标准偏差的常数。本文件考虑了两种情况。在第一个情况中,资产价格遵循一个几何分级的布朗运动。在第二个情况中,资产价格遵循一个随机波动过程。本文件以多种方式扩展基本强化学习方法。首先,它使用两种不同的Q功能,以便用不同的州/行动组合跟踪成本的预期值和成本方平方的预期值。这种方法增加了可以使用的客观功能的范围。第二,它使用一种学习算法,允许持续的状态和行动空间。第三,它比较了会计P&L方法(在每一步都对套期保值进行估值的情况下)和现金流量方法(在使用现金流入和流出时) 。我们发现,一种混合方法涉及使用会计P & L估值方法的混合方法,而不是采用一种假设的估价方法。