论文标题
在基于日志的异常检测中利用日志指令
Leveraging Log Instructions in Log-based Anomaly Detection
论文作者
论文摘要
IT操作的人工智能(AIOPS)描述了使用不同的AI支持的方法和工具来维护和操作大型IT系统的过程,例如,异常检测和根本原因分析,以支持修复,优化和自动启动自我稳定IT活动。任何AIOPS工作流程的核心步骤是异常检测,通常在高量异质数据上执行,例如日志消息(日志),指标(例如CPU利用率)和分布式痕迹。在本文中,我们提出了一种从系统日志中可靠且实用的异常检测方法。它通过构建1000+ GitHub项目源代码的日志指令来克服相关工作的常见缺点,即需要大量手动标记的培训数据。来自不同系统的说明包含有关许多不同正常和异常IT事件的丰富和异体信息,并作为异常检测的基础。所提出的方法名为Adlilog,将日志指令和来自感兴趣系统(目标系统)的数据结合在一起,以通过两相学习程序学习深度神经网络模型。实验结果表明,Adlilog的表现优于相关方法的F1分数高达60%,同时满足了工业部署的核心非功能性要求,例如无监督设计,有效的模型更新和小型模型尺寸。
Artificial Intelligence for IT Operations (AIOps) describes the process of maintaining and operating large IT systems using diverse AI-enabled methods and tools for, e.g., anomaly detection and root cause analysis, to support the remediation, optimization, and automatic initiation of self-stabilizing IT activities. The core step of any AIOps workflow is anomaly detection, typically performed on high-volume heterogeneous data such as log messages (logs), metrics (e.g., CPU utilization), and distributed traces. In this paper, we propose a method for reliable and practical anomaly detection from system logs. It overcomes the common disadvantage of related works, i.e., the need for a large amount of manually labeled training data, by building an anomaly detection model with log instructions from the source code of 1000+ GitHub projects. The instructions from diverse systems contain rich and heterogenous information about many different normal and abnormal IT events and serve as a foundation for anomaly detection. The proposed method, named ADLILog, combines the log instructions and the data from the system of interest (target system) to learn a deep neural network model through a two-phase learning procedure. The experimental results show that ADLILog outperforms the related approaches by up to 60% on the F1 score while satisfying core non-functional requirements for industrial deployments such as unsupervised design, efficient model updates, and small model sizes.