论文标题
最伟大的老师失败是:根据可用性和能源消耗,将加固学习用于SFC放置
The Greatest Teacher, Failure is: Using Reinforcement Learning for SFC Placement Based on Availability and Energy Consumption
论文作者
论文摘要
软件定义的网络(SDN)和网络功能虚拟化(NFV)使网络可编程,因此更加灵活和敏捷。为了满足服务水平协议,可以更大的利用来实现旧网络,更快的服务部署并减少支出,电信运营商正在部署越来越复杂的服务功能链(SFC)。尽管有SFC的好处,但从云到边缘的异质性和活力提高,仍引入了重大的SFC放置挑战,尤其是在维持可用性,服务质量以及最小化成本的同时增加或删除网络功能。在本文中,提出了基于增强学习(RL)的可用性和能源感知解决方案,以进行动态SFC放置。使用基于基于Rede nacional de Ensino E Pesquisa(RNP)网络的基础真实网络拓扑(RNP)的模拟,比较了两种政策感知的RL RL算法,即优势参与者 - 批评者(A2C)和近端政策优化(PPO2)。模拟结果表明,在接受率和能耗方面,PPO2通常优于A2C和贪婪的方法。 A2C仅在网络服务器具有更多计算资源的情况下优于PPO2。
Software defined networking (SDN) and network functions virtualisation (NFV) are making networks programmable and consequently much more flexible and agile. To meet service level agreements, achieve greater utilisation of legacy networks, faster service deployment, and reduce expenditure, telecommunications operators are deploying increasingly complex service function chains (SFCs). Notwithstanding the benefits of SFCs, increasing heterogeneity and dynamism from the cloud to the edge introduces significant SFC placement challenges, not least adding or removing network functions while maintaining availability, quality of service, and minimising cost. In this paper, an availability- and energy-aware solution based on reinforcement learning (RL) is proposed for dynamic SFC placement. Two policy-aware RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimisation (PPO2), are compared using simulations of a ground truth network topology based on the Rede Nacional de Ensino e Pesquisa (RNP) Network, Brazil's National Teaching and Research Network backbone. The simulation results showed that PPO2 generally outperformed A2C and a greedy approach both in terms of acceptance rate and energy consumption. A2C outperformed PPO2 only in the scenario where network servers had a greater number of computing resources.