How to Stop Wasting Money on Google BigQuery in Google Data Studio

One of my responsibilities as a product manager is tracking the influence of product on key metrics like revenue or MRR. I also prefer to share this data with my team for us to be on the same line while we are developing new features or improving something that already exists. A dashboard with a number of key metrics is good choice here.

We use Google BigQuery as an analytics data warehouse so everyone in the team is able to get all the data using SQL queries. Cloud database for analytics is a huge advantage. We do not need to build any infrastructure and support it. Also we do not spend our time on getting data from different sources.

To build dashboards for key metrics I usually use Google Data Studio. It has a native connector for BigQuery so I just white a query or choose a table and all the data is there. While I was building a dashboard for the life was beautiful and shiny. Everything looked simple and perfectly working until I’ve found some weird behaviour of Data Studio.

How Google Data Studio makes requests to BigQuery

Let’s go to BigQuery UI and see what happened. I was very surprised when I saw that Data Studio makes separate requests for every single plot if I use custom query as a connection type. I have twelve plots in my dashboard so instead of one query I see twelve. It might also happen several times per day:

Google BigQuery UI

I expected that Data Studio would make one request to get aggregated data and then use it a source for all plots. So I went to Google Cloud Console to check resources consumption and look through the logs. All the suspicious requests were made by my account but I usually do not write the code like this:

Google Cloud Logs Viewer

The only source for such requests was Data Studio. Why is this important? Because BigQuery is a paid product. You pay for data storage and data processing while querying tables. There is a free tier and if you exceed it the price is not so high but for those who like SELECT * it is not the best choice.

In BigQuery if you make the same requests multiple times the result is cached and is charged only for the first time. But Data Studio makes different requests to build every plot in my case and does not use cached results. So I had to build some solution that utilizes cache and reduces the cost of queries.

Using separate table to store aggregated results

My first attempt was to use a view based on custom request. But it didn’t work because Data Studio made different request to that view and the problem was the same. So my next idea was to save the results of the query to separate table. Also I wanted to update that table daily to get fresh data to the dashboard.

After playing a little with BigQuery UI and digging through documentation I’ve realised that there is no way to natively schedule the execution of the query. I had to build my own solution so Mr Google helped me again to find what I was looking for. Armed with this great article, a little knowledge of Apps Script and BigQuery API documentation I’ve wrote this code:

function runQuery() {

  var configuration = {
    "query": {
    "useQueryCache": false,
    "destinationTable": {
          "projectId": "my_project_id",
          "datasetId": "my_dataset_id",
          "tableId": "my_table_id"
        },
    "writeDisposition": "WRITE_TRUNCATE",
    "createDisposition": "CREATE_IF_NEEDED",
    "allowLargeResults": false,
    "useLegacySql": false,
    "query": "SELECT * FROM `my_project_id.my_dataset_id.my_view_id`"
    }
  };

  var job = {
    "configuration": configuration
  };

  var jobResult = BigQuery.Jobs.insert(job, "my_project_id");

  Logger.log(jobResult);

}

Here I select all records from the view my_view_id based on my custom query and write aggregated results to my_table_id table. Both are stored in my_dataset_id dataset in my_project_id project. I scheduled execution of this script using time based trigger to ran daily between one and two AM:

Google Apps Script Project Trigger

And voila - the size of separate table is 12 KB. It is updated daily and only one request is made to aggregate raw data using view. I use this table as a data source for my fancy dashboard in Data Studio and sleep well. The total reduction in billed queries is proportional to the number of plots. For me it was twelve times, for you it might be even bigger.

In the end I want to share some useful links if you are new to BigQuery:

Any kind of better solutions are highly accepted.

Comments