Getting Started With Spark

Many of you may be interested in how to get going with Spark.   Let’s look at a walkthrough.

Step 0: Get your system Ready.

You need to download a working jdk, if you don’t already have it.  We recommend at least Java 7.

Now, you need to download Scala as well, if you don’t have it already

Use the following link: https://www.scala-lang.org/download/

Once ready, open a shell window (or windows command prompt) and test and make sure scala and sbt are both installed and in your path.

$ scala

$ sbt

Windows users have a few extra steps.

First, download winutils.exe from https://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe

Put this in a directory and add to your path. (example C:\Winutils\)  This provides support for running hadoop libraries in windows.  For more info, please see this link:  https://wiki.apache.org/hadoop/WindowsProblems

Run the following command (again, Windows users only)

C:\winutils\bin\winutils.exe chmod 777 C:\tmp\hive

HADOOP_HOME=c:\winutils\

Ok, now we should be ready to run Spark.

Step 1:  Download Spark

You can download the latest spark from http://spark.apache.org/downloads.html

Here is a link you can use:

wget spark-2.1.0-bin-hadoop2.7.tgz

Once here, I like to put the spark directory named spark in my home directory.  Mac and Linux users can do that as follows:

$ mv spark-2.1.0-bin-hadoop2.7 ~/spark

Windows users can do something similar from a command prompt

> rename spark-2.1.0-bin-hadoop2.7 c:\spark

Step 2: Run Spark

You can run spark as follows

$ ~/spark/bin/start-all.sh  #Mac/Linux

c:\spark\bin\start-all.exe  (Windows)

Spark is now running on your machine!

Step 3: Check out the Spark UI

Go to localhost:8080. This will be your Spark master.  It should look something like this.

Check the following things out:

  1. How many Masters are running?  How Many Workers?
  2. What nodes are they running on?
  3. What is the memory availability?

 

Leave a Reply

Your email address will not be published. Required fields are marked *