Big Data processing systems handle huge unstructured and structured data to store, process, and analyze through cluster analysis which helps in identifying unseen patterns to find the relationships between them. Clustering analysis over the shared machines in big data technologies helps in deriving the relations and making decisions using data in context. It can handle every form of raw, tabular data along with structured, semi-structured, and unstructured data. The data doesn't have to possess linearity property. It can reflect associative and correlative patterns and groupings. The main contribution and findings of this paper are to gather and summarize the recent big data clustering techniques, and their strengths, and weaknesses in any distributed environment.
翻译:大数据处理系统处理大量无结构化和结构化的数据,以便储存、处理和分析,通过集束分析,帮助查明看不见的模式,找出它们之间的关系。对大数据技术中共享的机器进行集束分析,有助于形成关系,利用相关数据作出决策。它可以处理各种形式的原始、表格式数据以及结构化、半结构化和无结构化数据。数据不必具有线性属性。它可以反映关联性和关联性的模式和组合。本文的主要贡献和结论是收集和总结近期的大数据集群技术及其在任何分布环境中的长处和弱点。