cloud

Spark Cluster on Google Compute Engine

gce+sparkWhat is Spark and Why?

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop. In the past, I’ve wrote an intro on how to install Spark on GCE and since then, I wanted to do a follow up on the topic but with more real world example of installing a cluster. Luckily to me, a reader of the blog did the work! So after I got his approval, I wanted to share with you his script. Continue reading

Advertisement
Standard
cloud

How to Install Spark on Google Compute Engine

gce+sparkWhat is Google Compute Engine?

Compute Engine is an infrastructure as a service that lets you run your large-scale computing workloads on virtual machines hosted on Google’s infrastructure. Btw, if you wish to have a new machine under your arms in less than 5 minutes – It can be done in 5 easy steps.

What is Spark?

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

So in order to enjoy from both worlds, we can  leverage the great options of large-scale cloud that Compute engine offer us and install Spark on it. Here are the few steps you will need to follow in order to do it.

Installation steps Continue reading

Standard