论文标题
TSM:用自然语言处理测量蜂蜜的诱惑
TSM: Measuring the Enticement of Honeyfiles with Natural Language Processing
论文作者
论文摘要
Honeyfile部署是网络欺骗中的一种有用的违规检测方法,还可以将入侵者和恶意内部人士的意图和利益告知辩护人。蜜文件的关键特性,诱人的是文件可以吸引入侵者与之互动的程度。我们介绍了一个新颖的指标,主题语义匹配(TSM),该指标使用主题建模来表示存储库中的文件和嵌入矢量空间中的语义匹配,以牢固地比较蜜文件和主题单词。我们还提出了一种由不同的自然语言处理(NLP)方法创建的蜜文件语料库。实验表明,TSM有效地可在孔间比较中,并且是测量蜂蜜诱饵的有前途的工具。 TSM是使用NLP技术来量化蜜文件含量的诱惑的第一个措施,将蜜文件含量比较局部环境的基本局部内容与蜜文件,并且对释义是可靠的。
Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (TSM), which uses topic modelling to represent files in the repository and semantic matching in an embedding vector space to compare honeyfile text and topic words robustly. We also present a honeyfile corpus created with different Natural Language Processing (NLP) methods. Experiments show that TSM is effective in inter-corpus comparisons and is a promising tool to measure the enticement of honeyfiles. TSM is the first measure to use NLP techniques to quantify the enticement of honeyfile content that compares the essential topical content of local contexts to honeyfiles and is robust to paraphrasing.