Changes

Jump to: navigation, search

GPU621/Apache Spark Fall 2022

25 bytes added, 23:08, 3 December 2022
Create an S3 bucket
Unlike the previous case where we run on a solo computer, now we need to run the application on different nodes. It makes no sense to read the file from a local hard disk because most of the time the file will be too big for one node to handle. We need to put the file onto something that all nodes can share, and we can use aN S3 file as the input file. S3 is another service AWS provides. The size of a single file on S3 can be as large as 5TB. I will skip this part and please search how to create a new bucket to hold both the input file and the application package. Please make the bucket open to the public so you will not have permission issue later on.
-IMAGE-[[File: s3 bucket.png | 800px]]
===Build the application===
92
edits

Navigation menu