基于图形功能的机器学习算法的假新闻识别

论文标题

基于图形功能的机器学习算法的假新闻识别

Fake News Identification using Machine Learning Algorithms Based on Graph Features

论文作者

Tian, Yuxuan

论文摘要

假新闻的传播长期以来一直是一个社会问题，并且由于危险的危险已得到充分认可，因此确定伪造新闻的必要性已变得显而易见。除了引起公众中的不安之外，它还带来了更具毁灭性的后果。例如，由于未经验证的医疗指示，它可能导致大流行期间死亡。这项研究旨在建立一个模型，以使用图形和机器学习算法来识别假新闻。该研究没有扫描新闻内容或用户信息，而是明确关注扩展网络，该网络显示了人们之间的互连，以及图形向量中心，Jaccard系数和最短路径等图形功能。从图中提取了14个功能，并在13个机器学习模型中进行了测试。在分析了这些功能并比较机器学习模型的测试结果之后，结果反映了倾向和中心性对分类的贡献很大。最佳性能模型使用修改的树分类器和支持向量分类器分别从数据集Twitter15和Twitter16分别达到0.9913和0.9987。该模型可以有效地预测假新闻，防止假新闻引起的潜在负面社会影响，并为机器学习模型选择图形选择的新观点。

The spread of fake news has long been a social issue and the necessity of identifying it has become evident since its dangers are well recognized. In addition to causing uneasiness among the public, it has even more devastating consequences. For instance, it might lead to death during pandemics due to unverified medical instructions. This study aims to build a model for identifying fake news using graphs and machine learning algorithms. Instead of scanning the news content or user information, the research explicitly focuses on the spreading network, which shows the interconnection among people, and graph features such as the Eigenvector centrality, Jaccard Coefficient, and the shortest path. Fourteen features are extracted from graphs and tested in thirteen machine learning models. After analyzing these features and comparing the test result of machine learning models, the results reflect that propensity and centrality contribute highly to the classification. The best performing models reach 0.9913 and 0.9987 separately from datasets Twitter15 and Twitter16 using a modified tree classifier and Support Vector Classifier. This model can effectively predict fake news, prevent potential negative social impact caused by fake news, and provide a new perspective on graph feature selection for machine learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题