Running PySpark with Jupyter Notebooks

Many users would like to run jupyter notebooks with PySpark.  Unfortunately, it’s not so simple as simply starting “jupyter notebook”  That will open a jupyter notebook server, but pyspark code will not run.

What we need to do is run jupyter inside of Pyspark.  To do that, we can set the following variables:

One way to run this would be the following:

This will open up a jupyter notebook window in which pyspark code will run in your pyspark shell.

If you’re running on a cluster, then you can do pass the arguments to the cluster node.