In our world of ‘Big Data’ it can be time consuming and expensive to query massive datasets without the right infrastructure. Google BigQuery solves this problem by enabling super-fast, SQL-like queries against append-only tables, using the processing power of Google’s infrastructure.
What You need to do?
- Move your data into BigQuery – This is what we will do in this post.
- Let Google BigQuery handle the hard work.
- Query your big data with a smile in this cost/effective way.
How to upload data to Big Query?
There are two main approaches: stream you data or upload it directly from Google cloud storage. Let’s have a look at the steps to leverage Google cloud storage in order to upload data into BigQuery.
The main steps you need to follow:
- You will need to prepare your data. In this stage, you need to analyze and think what will be the best format (both JSON and CSV are supported).
- In our example, we will show you how to work with CSV files and even better, we will upload them to Google Cloud Storage and later with a BigQuery job we will make sure our data is being pulled automatically into BigQuery.
- Run a ‘sanity’ check to see that the new data is in good shape (optional step).
- Upload your the data to a project with a good name (The default project names are not too clear in most cases).
- Consider breaking your data (e.g monthly tables instead of a unique big one) because it will make life easier in the future to update, query and maintain the data source.
- Have an example dataset with data that reflect the popular cases. This could be great to give developer an option to ‘play’ with the data and see its value.
- Think on some good and bold example. A few sample queries are crucial to get people started on a dataset.