Changes

GPU621/Apache Spark Fall 2022

25 bytes added, 23:08, 3 December 2022

→‎Create an S3 bucket

Unlike the previous case where we run on a solo computer, now we need to run the application on different nodes. It makes no sense to read the file from a local hard disk because most of the time the file will be too big for one node to handle. We need to put the file onto something that all nodes can share, and we can use aN S3 file as the input file. S3 is another service AWS provides. The size of a single file on S3 can be as large as 5TB. I will skip this part and please search how to create a new bucket to hold both the input file and the application package. Please make the bucket open to the public so you will not have permission issue later on.

~~-IMAGE-~~[[File: s3 bucket.png | 800px]]

===Build the application===

RobinYu

92

edits

Changes

GPU621/Apache Spark Fall 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools