Difference between revisions of "GPU621/Apache Spark"

From CDOT Wiki
Jump to: navigation, search
(GPU621/Apache Spark)
m
Line 1: Line 1:
=GPU621/Apache Spark=
+
= Comparing Apache Hadoop and Apache Spark =
  
The common MapReduce parallel programming we have covered in this course was arguably made famous by Google. It was used by the company to process a massive data sets in parallel on a distributed cluster in order to index the web for accurate and efficient search results. Apache Hadoop, the open-source platform inspired by Google’s early proprietary technology has been one of the most popular big data processing frameworks. However, in recent years its usage has been declining in favor of other increasingly popular technologies, namely Apache spark. We will introduce the history and advantages (scalability, flexibility, resilience) that led to the popularization of Apache Hadoop for certain big data applications. Furthermore, our project will focus on demonstrating how a particular use case performs in Apache Hadoop versus Apache spark, and how this relates to the rising and waning adoption of Spark and Hadoop respectively.
+
MapReduce was famously used by Google to process massive data sets in parallel on a distributed cluster in order to index the web for accurate and efficient search results. Apache Hadoop, the open-source platform inspired by Google’s early proprietary technology has been one of the most popular big data processing frameworks. However, in recent years its usage has been declining in favor of other increasingly popular technologies, namely Apache spark. This project will focus on demonstrating how a particular use case performs in Apache Hadoop versus Apache spark, and how this relates to the rising and waning adoption of Spark and Hadoop respectively. It will compare the advantages of Apache Hadoop versus Apache Spark for certain big data applications.  
  
 
== Group Members ==
 
== Group Members ==
 
# Akhil Balachandran
 
# Akhil Balachandran
 
# Daniel Park
 
# Daniel Park
# Patrick O'Reilly
 
  
 
== Introduction ==
 
== Introduction ==
 +
 +
=== Apache Hadoop ===
 +
 
=== Apache Spark ===
 
=== Apache Spark ===
 +
  
 
== Progress ==
 
== Progress ==
 +
 +
1. Nov 9, 2020 - Added project description
 +
2. Nov 20, 2020 - Added outline for porject
 +
3.
 +
 +
= References =

Revision as of 16:28, 20 November 2020

Comparing Apache Hadoop and Apache Spark

MapReduce was famously used by Google to process massive data sets in parallel on a distributed cluster in order to index the web for accurate and efficient search results. Apache Hadoop, the open-source platform inspired by Google’s early proprietary technology has been one of the most popular big data processing frameworks. However, in recent years its usage has been declining in favor of other increasingly popular technologies, namely Apache spark. This project will focus on demonstrating how a particular use case performs in Apache Hadoop versus Apache spark, and how this relates to the rising and waning adoption of Spark and Hadoop respectively. It will compare the advantages of Apache Hadoop versus Apache Spark for certain big data applications.

Group Members

  1. Akhil Balachandran
  2. Daniel Park

Introduction

Apache Hadoop

Apache Spark

Progress

1. Nov 9, 2020 - Added project description 2. Nov 20, 2020 - Added outline for porject 3.

References