Employing K-Means for Performance Analysis between Apache Spark and MapReduce

E. Laxmi Lydia

Volume 17, Issue 12 (2023) Section 1 Paper ID: LZqOO

Employing K-Means for Performance Analysis between Apache Spark and MapReduce

Authors

E. Laxmi Lydia

Keywords

Vignan's Foundation for Science Technology & Research Department of Computer Science India

Abstract

The advent of big data has driven a significant interest in scalable data processing frameworks, particularly in the domain of computer science. This study compares Apache Spark and MapReduce, two prominent frameworks, analyzing their performance using the K-Means clustering algorithm. We explore critical parameters such as scheduling delay, speedup, and energy consumption to evaluate their efficiency. Apache Spark, designed to overcome the rigid structure of MapReduce, offers a versatile processing model suitable for a broader range of workloads. Our findings demonstrate that Apache Spark outperforms MapReduce in terms of speed and flexibility, making it a strong candidate for handling large-scale data applications. This paper provides insights into the strengths and limitations of each framework, offering guidance for selecting the appropriate architecture for big data processing tasks.

Access Full Text (PDF) ← Back to Issue