Changes

The Real A Team

No change in size, 22:43, 6 April 2016

→‎Using Apache's Spark

==PreAssignment==

This assignment will be going over how to do a simple word count in Scala using Spark.

===Introduction To Spark===

To introduce myself to spark I watched a couple of youtube videos that were filmed at a Spark conference. The videos can be found here: https://www.youtube.com/watch?v=nxCm-_GdTl8 They include 7 videos that go over what Spark is, and how it is meant to be used.

In summery Spark is built to run big data across many machines. It is not meant for many transactions, but is meant for the analysis of data.

Spark is built using a RDD (Resilient Distributed Dataset). This means that Spark does not edit the data that is passed in, but rather used the data to preform transformations (filters, joins, maps, etc.) and then actions (reductions, counts, etc.)

The results are stored into new datasets instead of altering existing ones.

The RDD's are meant to be stored in memory for quick access, however Spark is built so that if necessary the RDD's can be written to the drive. (At a reduced IO speed).

As mentioned above, there are three main steps in a Spark program. Creation, Transformation, Action.

====Setting Up The Scala Environment For Spark====

</nowiki>

~~===Introduction To Spark===~~

To introduce myself to spark I watched a couple of youtube videos that were filmed at a Spark conference. The videos can be found here: https://www.youtube.com/watch?v=nxCm-_GdTl8 They include 7 videos that go over what Spark is, and how it is meant to be used.

~~In summery Spark is built to run big data across many machines. It is not meant for many transactions, but is meant for the analysis of data.~~

Spark is built using a RDD (Resilient Distributed Dataset). This means that Spark does not edit the data that is passed in, but rather used the data to preform transformations (filters, joins, maps, etc.) and then actions (reductions, counts, etc.)

~~The results are stored into new datasets instead of altering existing ones.~~

~~The RDD's are meant to be stored in memory for quick access, however Spark is built so that if necessary the RDD's can be written to the drive. (At a reduced IO speed).~~

~~As mentioned above, there are three main steps in a Spark program. Creation, Transformation, Action.~~

Adrian A Sauvageot

1

edit

Changes

The Real A Team

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools