Workload Characterization Of Hadoop Cluster

Research Article
C.M.Arun, D Sathya and V.Monica
DOI: 
http://dx.doi.org/10.24327/ijrsr.2019.1005.3453
Subject: 
science
KeyWords: 
HDFS is a tool to implement Hadoop. The default scheduler used is a YARN scheduler. It is concerned with five priority levels
Abstract: 

As the organizations start to use data intensive cluster computing systems like Hadoop for more applications, there is a growing need to share clusters between users. To address the conflict between locality and fairness various algorithm are proposed. Map Reduce is becoming the high-tech computing paradigm for processing large-scale datasets on a large cluster with large number of nodes. It has been most useful in various applications such as e-commerce, Web search, social networks, and scientific computation. Tounderst and the characteristics of Map Reduce workloads is the key to achieve better configuration decisions and improving the system throughput. To achieve better performance, a map reduce scheduler must avoid unnecessary data transmission by enhancing the data locality. A map reduce which is found to be the offline computing Engine solves the issues of too large data to fit into a single machine. This mapreducefunction comprises of Job Tracker and Task Tracker, where the Job Tracker is concerned with the division of the given input dataset into chunks and sends to the individual nodes. Map Reduce is a programming model which supports distributed and parallel computing on the data intensive applications. HDFS is a tool to implement Hadoop. The default scheduler used is a YARN scheduler. It is concerned with five priority levels. Improving the performance of the map reduce function will be in the contest of improvement in system latency, memory settings, input output bandwidth, job parallelization. The factors that affect the performance of the Hadoop are Hardware, Mapreduce, HDFS, Shuffle tweaks.