论文标题
系统日志解析:调查
System Log Parsing: A Survey
论文作者
论文摘要
现代信息和通信系统已经变得越来越具有挑战性。无处不在的系统日志包含大量信息,因此被广泛利用为系统管理的替代来源。由于日志文件通常包含大量原始数据,因此手动分析它们是费力且容易出错的。因此,许多研究努力已致力于自动对数分析。但是,这些作品通常会期望结构化的输入和与原始系统日志的异质性质作斗争。日志解析通过将非结构化的系统日志转换为结构化记录来缩小此差距。在过去的几十年中,提出了许多解析器,以适应各种日志分析应用。但是,由于有足够的解决方案空间和缺乏系统的评估,从业者来说,找到适合其需求的现成解决方案并不容易。 本文旨在提供有关日志解析的全面调查。我们从现有的日志解析器的详尽分类学开始。然后,我们凭经验分析了17种开源解决方案的关键性能和操作特征,既有定量和定性,并且每当适用讨论替代方法的优点时。我们还详细阐述了未来的挑战,并讨论了相关的研究指示。我们将这项调查视为系统管理员和域专家的有用资源,以选择最理想的开源解决方案或基于特定于应用程序的要求实施新的解决方案。
Modern information and communication systems have become increasingly challenging to manage. The ubiquitous system logs contain plentiful information and are thus widely exploited as an alternative source for system management. As log files usually encompass large amounts of raw data, manually analyzing them is laborious and error-prone. Consequently, many research endeavors have been devoted to automatic log analysis. However, these works typically expect structured input and struggle with the heterogeneous nature of raw system logs. Log parsing closes this gap by converting the unstructured system logs to structured records. Many parsers were proposed during the last decades to accommodate various log analysis applications. However, due to the ample solution space and lack of systematic evaluation, it is not easy for practitioners to find ready-made solutions that fit their needs. This paper aims to provide a comprehensive survey on log parsing. We begin with an exhaustive taxonomy of existing log parsers. Then we empirically analyze the critical performance and operational features for 17 open-source solutions both quantitatively and qualitatively, and whenever applicable discuss the merits of alternative approaches. We also elaborate on future challenges and discuss the relevant research directions. We envision this survey as a helpful resource for system administrators and domain experts to choose the most desirable open-source solution or implement new ones based on application-specific requirements.