资产分配的无模型增强学习

论文标题

资产分配的无模型增强学习

Model-Free Reinforcement Learning for Asset Allocation

论文作者

Oshingbesan, Adebayo, Ajiboye, Eniola, Kamashazi, Peruth, Mbaka, Timothy

论文摘要

资产分配（或投资组合管理）是确定如何最佳将有限预算的资金分配给一系列金融工具/资产（例如股票）的任务。这项研究调查了使用无模型的深RL代理应用于投资组合管理的增强学习（RL）的性能。我们培训了几个RL代理商的现实股票价格，以学习如何执行资产分配。我们比较了这些RL代理与某些基线剂的性能。我们还比较了RL代理，以了解哪种类型的代理的性能更好。从我们的分析中，RL代理可以执行投资组合管理的任务，因为它们的表现明显优于两个基线代理（随机分配和统一分配）。四个RL代理（A2C，SAC，PPO和TRPO）总体上优于最佳基线MPT。这显示了RL代理商发现更多有利可图的交易策略的能力。此外，基于价值和基于策略的RL代理之间没有显着的性能差异。演员批评剂的表现比其他类型的药物更好。同样，在政策代理商方面的表现要好，因为它们在政策评估方面更好，样品效率并不是投资组合管理中的重大问题。这项研究表明，RL代理可以大大改善资产分配，因为它们的表现要优于强基础。基于我们的分析，派对批评者的批评者表现出了最大的希望。

Asset allocation (or portfolio management) is the task of determining how to optimally allocate funds of a finite budget into a range of financial instruments/assets such as stocks. This study investigated the performance of reinforcement learning (RL) when applied to portfolio management using model-free deep RL agents. We trained several RL agents on real-world stock prices to learn how to perform asset allocation. We compared the performance of these RL agents against some baseline agents. We also compared the RL agents among themselves to understand which classes of agents performed better. From our analysis, RL agents can perform the task of portfolio management since they significantly outperformed two of the baseline agents (random allocation and uniform allocation). Four RL agents (A2C, SAC, PPO, and TRPO) outperformed the best baseline, MPT, overall. This shows the abilities of RL agents to uncover more profitable trading strategies. Furthermore, there were no significant performance differences between value-based and policy-based RL agents. Actor-critic agents performed better than other types of agents. Also, on-policy agents performed better than off-policy agents because they are better at policy evaluation and sample efficiency is not a significant problem in portfolio management. This study shows that RL agents can substantially improve asset allocation since they outperform strong baselines. On-policy, actor-critic RL agents showed the most promise based on our analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题