论文标题
对投资组合管理的深入强化学习和凸出均值优化
Deep Reinforcement Learning and Convex Mean-Variance Optimisation for Portfolio Management
论文作者
论文摘要
传统的投资组合管理方法可以纳入特定的投资者偏好,但依赖于资产回报和协方差的准确预测。强化学习(RL)方法不依赖这些明确的预测,并且更适合多阶段的决策过程。为了解决评估研究的局限性,在不同趋势不同的不同经济体的三个市场上进行了实验。通过将特定的投资者偏好纳入我们的RL模型的奖励功能中,可以对风险回收空间中的传统方法进行更全面的比较。交易成本还通过包括市场波动和交易量引入的非线性变化来更现实地建模。这项研究的结果表明,与在某些市场条件下的传统凸均值变化优化方法相比,使用RL方法可能具有优势。我们的RL模型可能会大大优于传统的单周期优化(SPO)和多周期优化(MPO)模型,但仅达到特定的风险限制。在侧向趋势市场中,我们的RL模型可以与所测试的大多数多数风险范围密切匹配SPO和MPO模型的性能。这些模型可以互相胜过的特定市场条件强调了对风险回收空间中帕累托最佳前沿的更全面比较的重要性。这些边界为投资者提供了更精细的视图,其中哪种模型可以为其特定的风险承受能力或回报目标提供更好的性能。
Traditional portfolio management methods can incorporate specific investor preferences but rely on accurate forecasts of asset returns and covariances. Reinforcement learning (RL) methods do not rely on these explicit forecasts and are better suited for multi-stage decision processes. To address limitations of the evaluated research, experiments were conducted on three markets in different economies with different overall trends. By incorporating specific investor preferences into our RL models' reward functions, a more comprehensive comparison could be made to traditional methods in risk-return space. Transaction costs were also modelled more realistically by including nonlinear changes introduced by market volatility and trading volume. The results of this study suggest that there can be an advantage to using RL methods compared to traditional convex mean-variance optimisation methods under certain market conditions. Our RL models could significantly outperform traditional single-period optimisation (SPO) and multi-period optimisation (MPO) models in upward trending markets, but only up to specific risk limits. In sideways trending markets, the performance of SPO and MPO models can be closely matched by our RL models for the majority of the excess risk range tested. The specific market conditions under which these models could outperform each other highlight the importance of a more comprehensive comparison of Pareto optimal frontiers in risk-return space. These frontiers give investors a more granular view of which models might provide better performance for their specific risk tolerance or return targets.