E. Laxmi Lydia
The advent of big data has driven a significant interest in scalable data processing frameworks, particularly in the domain of computer science. This study compares Apache Spark and MapReduce, two prominent frameworks, analyzing their performance using the K-Means clustering algorithm. We explore critical parameters such as scheduling delay, speedup, and energy consumption to evaluate their efficiency. Apache Spark, designed to overcome the rigid structure of MapReduce, offers a versatile processing model suitable for a broader range of workloads. Our findings demonstrate that Apache Spark outperforms MapReduce in terms of speed and flexibility, making it a strong candidate for handling large-scale data applications. This paper provides insights into the strengths and limitations of each framework, offering guidance for selecting the appropriate architecture for big data processing tasks.