E. Laxmi Lydia
Big data storage management is a significant challenge in Hadoop cluster environments due to the high degree of data access locality required by data-intensive applications. Traditionally, high-performance computing has relied on dedicated servers for data storage and replication. This research introduces a 'Disparateness-Aware Scheduling algorithm' to address the issues of resource and job disparity in cluster environments. Utilizing K-centroids clustering, the proposed method focuses on energy consumption in Hadoop clusters, enhancing system reliability. Resources are categorized to minimize scheduling delay using the K-Centroids algorithm. A novel provisioning mechanism considers load, energy, and network time, optimizing the fitness function for Particle Swarm Optimization (PSO) to select computing nodes. The study also addresses fault tolerance by focusing on cluster migration for failure nodes, allowing recomputation and prediction of optimal nodes via PSO. Experimental results demonstrate improvements in scheduling length, delay, speed, failure ratio, and energy consumption compared to existing systems.