How to Install Spark on Google Compute Engine

gce+sparkWhat is Google Compute Engine?

Compute Engine is an infrastructure as a service that lets you run your large-scale computing workloads on virtual machines hosted on Google’s infrastructure. Btw, if you wish to have a new machine under your arms in less than 5 minutes – It can be done in 5 easy steps.

What is Spark?

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

So in order to enjoy from both worlds, we can  leverage the great options of large-scale cloud that Compute engine offer us and install Spark on it. Here are the few steps you will need to follow in order to do it.

Installation steps

  1. Create a CentOS image on GCE

    1. “A journey of a thousand miles begins with a single step.” and in our case this is the first one.

    2. (!) Important – use at least  3.8 GB memory because it won’t compile on less.

  1. ssh to your new machine. For example:
     gcutil --service_version="xxx1" --project="spark-testing-123" ssh --zone="europe-west1-a" "spark-box-3g"
  1. Install Java – sudo yum install java-1.7.0-openjdk-devel

    1. Make sure you have the ‘devel’ version – so it’s the full sdk. You can see what packages are out there with:

       yum search java | grep ‘java-‘

    2. You might wish to have 1.7 or 1.6 base on other requirements you might have.

    3. Another option is to make sure you install python, scala and java.

  1. Install Git – yum install git

  2. wget on of the packages from the download page

  1. Run sbt/sbt assembly

  2. Run sbt/sbt package

  3. You are good to go! Try some of the examples under the spark directory.

    1. ./spark-shell

    2. Run one of the example under examples/


  1. Spark Downloads

  2. Spark Docs

  3. Google Compute Engine

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s