Profiler：基于个人资料的模型检测网络钓鱼电子邮件

论文标题

Profiler：基于个人资料的模型检测网络钓鱼电子邮件

Profiler: Profile-Based Model to Detect Phishing Emails

论文作者

Shmalko, Mariya, Abuadbba, Alsharif, Gaire, Raj, Wu, Tingmin, Paik, Hye-Young, Nepal, Surya

论文摘要

电子邮件网络钓鱼变得更加普遍，随着时间的流逝，网络钓鱼变得更加复杂。为了打击这一崛起，已经开发了许多用于检测网络钓鱼电子邮件的机器学习（ML）算法。但是，由于这些算法训练的电子邮件数据集有限，因此它们不擅长识别各种攻击，因此遭受了概念漂移的困扰。攻击者可以在其电子邮件或网站的统计特征上引入小小的变化，以成功绕过检测。随着时间的流逝，文献所报告的准确性与算法在现实世界中的实际有效性之间存在差距。这以频繁的假阳性和假阴性分类意识到自己。为此，我们建议对电子邮件进行多维风险评估，以减少攻击者调整电子邮件并避免检测的可行性。这种横向发送网络钓鱼检测的水平方法为其主要功能提供了传入的电子邮件。我们开发了一个风险评估框架，其中包括三个模型，分析了电子邮件（1）威胁级别，（2）认知操作和（3）电子邮件类型，我们合并了这些电子邮件类型，以返回最终的风险评估评分。剖道师不需要大量的数据集进行训练以有效，其对各种电子邮件功能的分析会减少概念漂移的影响。我们的参考器可以与ML方法结合使用，以减少其错误分类，也可以作为培训阶段中大型电子邮件数据集的标签。我们在9000个合法的数据集中使用最先进的ML算法评估了探查者对机器学习集合的功效，并从一个大型澳大利亚研究组织的900个网络钓鱼电子邮件中进行了效力。我们的结果表明，探查者的概念漂移的影响减少了30％的假阳性，而对ML Ensemble的方法的虚假负面电子邮件分类减少了25％。

Email phishing has become more prevalent and grows more sophisticated over time. To combat this rise, many machine learning (ML) algorithms for detecting phishing emails have been developed. However, due to the limited email data sets on which these algorithms train, they are not adept at recognising varied attacks and, thus, suffer from concept drift; attackers can introduce small changes in the statistical characteristics of their emails or websites to successfully bypass detection. Over time, a gap develops between the reported accuracy from literature and the algorithm's actual effectiveness in the real world. This realises itself in frequent false positive and false negative classifications. To this end, we propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection. This horizontal approach to email phishing detection profiles an incoming email on its main features. We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type, which we combine to return the final risk assessment score. The Profiler does not require large data sets to train on to be effective and its analysis of varied email features reduces the impact of concept drift. Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage. We evaluate the efficacy of the Profiler against a machine learning ensemble using state-of-the-art ML algorithms on a data set of 9000 legitimate and 900 phishing emails from a large Australian research organisation. Our results indicate that the Profiler's mitigates the impact of concept drift, and delivers 30% less false positive and 25% less false negative email classifications over the ML ensemble's approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题