Using the Spark Shell

Please see my previous post for using Spark.

We are going to go over using the spark shell.

Step 1:  Running the Spark shell

Start the Spark shell, by running

$ ~/spark/spark-shell

or

c:\spark\spark-shell.exe

You should see something like this:

 

Step 2: Check out spark UI

Now that your shell is started, you should be able to browse to http://localhost:4040 and check out the spark UI.  Note that this is a different port (4040 vs 8080) from the previous example.

STEP 3: Spark context

Within Spark shell, variable sc is the SparkContext Type sc in scala prompt and see what happens. Your output might look like this

To see all methods in sc variable, type sc. and double-TAB This will show all the available methods on sc variable. (This only works on Scala shell for now)

Try the following:

==> Print the name of application name sc.appName

==> Find the ‘Spark master’ for the shell sc.master


STEP 4: Load a file

Let’s load an example file: README.md

twinkle twinkle little star how I wonder what you are up above the world so high like a diamond in the sky twinkle twinkle little star

Let’s load the file:

==> What is the ‘type’ of f ?
hint : type f on the console

==> Inspect Spark Shell UI on port 4040, do you see any processing done? Why (not)?

==> Print the first line / record from RDD
hint : f.first()

==> Again, inspect Spark Shell UI on port 4040, do you see any processing done? Why (not)?

==> Print first 3 lines of RDD
hint : f.take(???) (provide the correct argument to take function)

==> Again, inspect Spark Shell UI on port 4040, do you see any processing done? Why (not)?

==> Print all the content from the file
hint : f.collect()

==> How many lines are in the file?
hint : f.count()

==> Inspect the ‘Jobs’ section in Shell UI (in browser)
Also inspect the event time line

Leave a Reply

Your email address will not be published. Required fields are marked *