对风整体电力系统的网络安全评估的深入加强学习

论文标题

对风整体电力系统的网络安全评估的深入加强学习

Deep Reinforcement Learning for Cybersecurity Assessment of Wind Integrated Power Systems

论文作者

Liu, XiaoRui, Ospina, Juan, Konstantinou, Charalambos

论文摘要

可再生能源（RES）的整合在电力系统（EPS）中迅速增加。虽然将间歇性的重点加入，再加上通信和传感设备的大规模部署对于完全智能的网格很重要，但它也扩大了网络威胁性的景观，有效地使电力系统易受网络攻击。本文提出了一种旨在评估EPS的网络物理安全性的网络安全评估方法。这项工作考虑了间歇性生成RES，基于微处理器的电子信息和操作技术（IT/OT）设备引入的漏洞以及应急分析结果。所提出的方法利用深入的强化学习（DRL）和适应的共同漏洞评分系统（CVSS）得分量身定制的，以评估EPS中的漏洞，以确定基于N-2意外事件结果的最佳攻击过渡策略，即两个系统元素的同时失败。通过对基于文献的电网测试案例进行的数值和实时仿真实验来验证工作的有效性。结果表明，基于深Q网络（DQN）的提议方法如何与图形搜索方法紧密相关，以找到最佳攻击策略所需的过渡次数，而无需完全观察系统。此外，该实验通过展示在大型系统（例如PORISE 2383总线测试系统）中找到最佳攻击过渡策略所需的过渡数量来介绍该方法的可扩展性。结果表明，与随机过渡策略相比，所提出的方法的过渡阶段需要减少一个数量级。

The integration of renewable energy sources (RES) is rapidly increasing in electric power systems (EPS). While the inclusion of intermittent RES coupled with the wide-scale deployment of communication and sensing devices is important towards a fully smart grid, it has also expanded the cyber-threat landscape, effectively making power systems vulnerable to cyberattacks. This paper proposes a cybersecurity assessment approach designed to assess the cyberphysical security of EPS. The work takes into consideration the intermittent generation of RES, vulnerabilities introduced by microprocessor-based electronic information and operational technology (IT/OT) devices, and contingency analysis results. The proposed approach utilizes deep reinforcement learning (DRL) and an adapted Common Vulnerability Scoring System (CVSS) score tailored to assess vulnerabilities in EPS in order to identify the optimal attack transition policy based on N-2 contingency results, i.e., the simultaneous failure of two system elements. The effectiveness of the work is validated via numerical and real-time simulation experiments performed on literature-based power grid test cases. The results demonstrate how the proposed method based on deep Q-network (DQN) performs closely to a graph-search approach in terms of the number of transitions needed to find the optimal attack policy, without the need for full observation of the system. In addition, the experiments present the method's scalability by showcasing the number of transitions needed to find the optimal attack transition policy in a large system such as the Polish 2383 bus test system. The results exhibit how the proposed approach requires one order of magnitude fewer transitions when compared to a random transition policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题