Changes

Jump to: navigation, search

GPU621/Apache Spark

25 bytes added, 13:52, 30 November 2020
m
Analysis: Spark vs Hadoop
=== Performance ===
== Analysis: Spark vs Hadoop Wordcount Performance ==
=== Methodology ===
Hadoop and Spark clusters can be deployed in cloud environments such as the Google Cloud Platform or Amazon EMR.
# Drag and drop the below word-count.py into the browser, or use 'UPLOAD FILES' to upload.
# word-count.py
#!/usr/bin/env python
 
import pyspark
import sys
 
if len(sys.argv) != 3:
raise Exception("Exactly 2 arguments are required: <inputUri> <outputUri>")
 
inputUri=sys.argv[1]
outputUri=sys.argv[2]
 
sc = pyspark.SparkContext()
lines = sc.textFile(sys.argv[1])
76
edits

Navigation menu