论文标题
vue4logs-异质计算机系统日志的自动结构
vue4logs -- Automatic Structuring of Heterogeneous Computer System Logs
论文作者
论文摘要
计算机系统日志数据通常用于系统监控,性能特征调查,工作流程建模和异常检测。日志数据本质上是非结构化的或半结构化的,这使得通过读取原始日志更难理解系统的事件流或其他重要信息。构造日志文件的过程首先根据触发它们的系统事件来标识日志消息组,并提取事件模板以表示每个事件的日志消息。本文通过使用信息检索领域常用的矢量空间模型来从原始系统日志文件中提取事件模板的新方法,以根据其向量相似性将日志数据和组日志消息矢量化和组日志消息矢量化。通过使用基于字符和长度的过滤器,进一步增强了模板提取过程。当对公开可用的现实日志数据基准进行评估时,此提出的方法在准确性和鲁棒性方面优于所有可用的最新系统。
Computer system log data is commonly used in system monitoring, performance characteristic investigation, workflow modeling and anomaly detection. Log data is inherently unstructured or semi-structured, which makes it harder to understand the event flow or other important information of a system by reading raw logs. The process of structuring log files first identifies the log message groups based on the system events that triggered them, and extracts an event template to represent the log messages of each event. This paper introduces a novel method to extract event templates from raw system log files, by using the vector space model commonly used in the field of Information Retrieval to vectorize log data and group log messages into event templates based on their vector similarity. Template extraction process is further enhanced with the use of character and length based filters. When evaluated on publicly available real-world log data benchmarks, this proposed method outperforms all the available state-of-the-art systems in terms of accuracy and robustness.