Difference between revisions of "GPU621/Apache Spark"

From CDOT Wiki
Jump to: navigation, search
m
m
Line 3: Line 3:
 
# Daniel Park
 
# Daniel Park
  
 +
 +
[[File:Apache Spark logo.svg.png|100pixels|Image: 100 pixels]]
 
= Apache Hadoop vs Apache Spark =
 
= Apache Hadoop vs Apache Spark =
  
 
MapReduce was famously used by Google to process massive data sets in parallel on a distributed cluster in order to index the web for accurate and efficient search results. Apache Hadoop, the open-source platform inspired by Google’s early proprietary technology has been one of the most popular big data processing frameworks. However, in recent years its usage has been declining in favor of other increasingly popular technologies, namely Apache spark. This project will focus on demonstrating how a particular use case performs in Apache Hadoop versus Apache spark, and how this relates to the rising and waning adoption of Spark and Hadoop respectively. It will compare the advantages of Apache Hadoop versus Apache Spark for certain big data applications.  
 
MapReduce was famously used by Google to process massive data sets in parallel on a distributed cluster in order to index the web for accurate and efficient search results. Apache Hadoop, the open-source platform inspired by Google’s early proprietary technology has been one of the most popular big data processing frameworks. However, in recent years its usage has been declining in favor of other increasingly popular technologies, namely Apache spark. This project will focus on demonstrating how a particular use case performs in Apache Hadoop versus Apache spark, and how this relates to the rising and waning adoption of Spark and Hadoop respectively. It will compare the advantages of Apache Hadoop versus Apache Spark for certain big data applications.  
 +
  
 
<div style="font-size: 1.200em; width: 80%">
 
<div style="font-size: 1.200em; width: 80%">

Revision as of 16:45, 20 November 2020

Group Members

  1. Akhil Balachandran
  2. Daniel Park


Image: 100 pixels

Apache Hadoop vs Apache Spark

MapReduce was famously used by Google to process massive data sets in parallel on a distributed cluster in order to index the web for accurate and efficient search results. Apache Hadoop, the open-source platform inspired by Google’s early proprietary technology has been one of the most popular big data processing frameworks. However, in recent years its usage has been declining in favor of other increasingly popular technologies, namely Apache spark. This project will focus on demonstrating how a particular use case performs in Apache Hadoop versus Apache spark, and how this relates to the rising and waning adoption of Spark and Hadoop respectively. It will compare the advantages of Apache Hadoop versus Apache Spark for certain big data applications.


Introduction

Apache Hadoop

What is Apache Hadoop?

Applications

Apache Spark

What is Apache Spark?

Applications

Overview: Spark vs Hadoop

Advantage and Disadvantages

Parallelism

Performance

Analysis: Spark vs Hadoop

Methodology

Setup

Results

Conclusion

Progress

  1. Nov 9, 2020 - Added project description
  2. Nov 20, 2020 - Added outline and subsections

References

https://hadoop.apache.org/ https://spark.apache.org/