在分布式环境中为大数据聚类的最新方法的撰写摘要

论文标题

在分布式环境中为大数据聚类的最新方法的撰写摘要

Writing summary for the state-of-the-art methods for big data clustering in distributed environment

论文作者

Gyawali, Dipesh

论文摘要

大数据处理系统处理庞大的非结构化和结构化数据以通过聚类分析存储，处理和分析，这有助于识别看不见的模式以找到它们之间的关系。大数据技术中共享机器的聚类分析有助于得出关系并在上下文中使用数据做出决策。它可以处理各种形式的原始表格数据，以及结构化，半结构化和非结构化数据。数据不必具有线性属性。它可以反映关联和相关模式和分组。本文的主要贡献和发现是收集和总结最近的大数据聚类技术，其优势以及在任何分布式环境中的缺点。

Big Data processing systems handle huge unstructured and structured data to store, process, and analyze through cluster analysis which helps in identifying unseen patterns to find the relationships between them. Clustering analysis over the shared machines in big data technologies helps in deriving the relations and making decisions using data in context. It can handle every form of raw, tabular data along with structured, semi-structured, and unstructured data. The data doesn't have to possess linearity property. It can reflect associative and correlative patterns and groupings. The main contribution and findings of this paper are to gather and summarize the recent big data clustering techniques, and their strengths, and weaknesses in any distributed environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题