使用Apache Spark生成的COVID-19生成的大规模数据的多维大数据存储系统

论文标题

使用Apache Spark生成的COVID-19生成的大规模数据的多维大数据存储系统

A Multi-Dimensional Big Data Storing System for Generated COVID-19 Large-Scale Data using Apache Spark

论文作者

Elmeiligy, Manar A., Desouky, Ali I. El, Elghamrawy, Sally M.

论文摘要

冠状病毒疾病（Covid-19）的持续爆发在武汉中国爆发，特别是在2019年12月。Covid-19是由以前在人类中没有发现的新病毒引起的。接下来是这种流行病在世界范围内广泛而迅速的传播。每天，确认病例的数量正在迅速增加，根据这种疾病伴随的症状，嫌疑人的数量增加，不幸的是，死亡人数也会增加。因此，随着世界各地的案件数量的增加，很难以不同的情况来管理所有这些案件信息。如果患者受伤或怀疑患者出现的症状。因此，迫切需要构建一个多维系统来存储和分析生成的大规模数据。在本文中，提出了使用Apache Spark（CSS-COVID）的COVID-19数据的全面存储系统，以解决和管理每天增加Covid-19引起的问题。 CSS-COVID有助于减少查询和存储COVID-19的每日数据的处理时间。 CSS-Covid由三个阶段组成，即插入和索引，存储和查询阶段。在插入阶段，数据分为子集，然后分别为每个子集索引。存储阶段使用一组存储节点来存储数据，而查询阶段负责处理查询过程。在CSS-Covid中使用Apache Spark的表现，可以处理每天增加的冠状病毒疾病的大规模数据。使用Real Covid-19数据集应用了一组实验，以证明CSS-COVID在索引大规模数据中的效率。

The ongoing outbreak of coronavirus disease (COVID-19) had burst out in Wuhan China, specifically in December 2019. COVID-19 has caused by a new virus that had not been identified in human previously. This was followed by a widespread and rapid spread of this epidemic throughout the world. Daily, the number of the confirmed cases are increasing rapidly, number of the suspect increases, based on the symptoms that accompany this disease, and unfortunately number of the deaths also increase. Therefore, with these increases in number of cases around the world, it becomes hard to manage all these cases information with different situations; if the patient either injured or suspect with which symptoms that appeared on the patient. Therefore, there is a critical need to construct a multi-dimensional system to store and analyze the generated large-scale data. In this paper, a Comprehensive Storing System for COVID-19 data using Apache Spark (CSS-COVID) is proposed, to handle and manage the problem caused by increasing the number of COVID-19 daily. CSS-COVID helps in decreasing the processing time for querying and storing COVID-19 daily data. CSS-COVID consists of three stages, namely, inserting and indexing, storing, and querying stage. In the inserting stage, data is divided into subsets and then index each subset separately. The storing stage uses set of storing-nodes to store data, while querying stage is responsible for handling the querying processes. Using Apache Spark in CSS-COVID leverages the performance of dealing with large-scale data of the coronavirus disease injured whom increase daily. A set of experiments are applied, using real COVID-19 Datasets, to prove the efficiency of CSS-COVID in indexing large-scale data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题