Print ISSN: 2155-3769/2689-5293 | E-ISSN: 2689-5307

Employing K-Means for Performance Analysis between Apache Spark and MapReduce

E. Laxmi Lydia

The advent of big data has driven a significant interest in scalable data processing frameworks, particularly in the domain of computer science. This study compares Apache Spark and MapReduce, two prominent frameworks, analyzing their performance using the K-Means clustering algorithm. We explore critical parameters such as scheduling delay, speedup, and energy consumption to evaluate their efficiency. Apache Spark, designed to overcome the rigid structure of MapReduce, offers a versatile processing model suitable for a broader range of workloads. Our findings demonstrate that Apache Spark outperforms MapReduce in terms of speed and flexibility, making it a strong candidate for handling large-scale data applications. This paper provides insights into the strengths and limitations of each framework, offering guidance for selecting the appropriate architecture for big data processing tasks.

Access Full Text (PDF) ← Back to Issue