Mitigation Of Data Skew Using Block Chain Algorithm

Research Article
RajaSaranyaKumari R., Lenny Arul Jegan and Yuvasree J
DOI: 
xxx-xxxxx-xxxx
Subject: 
science
KeyWords: 
Map Reduce, Data skew, Hadoop, Block Chain, Web Crawling, Live data Migration.
Abstract: 

In Big Data, logs of Peta and Tera bytes of data clusters are need to be processed. Hadoop allows these large clusters to be processed by using MapReduce technique, which is a programming tool for processing of data. As a result of MapReduce, Dataskew problem arises because of the static partitioning method followed in traditional Hadoop clusters. This leads to a delay in the overall throughput. To overcome this problem, we propose an innovative concept of Block Chain algorithm, an efficient dynamic data splitting strategy on Hadoop, which monitors the samples while running batch jobs and allocate resources to slaves depending on the complexity of data and the time taken for processing. We also implement Web Crawling to reduce the same, using Hadoop thus eliminating DDOS attack detection scenarios that will happen on the servers we are crawling, which can be done using the distributed systems. This causes the overall output to be enhanced. We implement this project in Hadoop, compare the results with MapReduce technique and our experiment to show that it has negligible overhead and can speed up the execution.