Travels in time with BigQuery

Oct 7, 2022 Big query, Gcp, Data engineering 2 min read

The travels in time are possible in BigQuery. Not many of the BigQuery users are aware of this feature, although it might be incredibly useful in a daily life of any data engineer. There are multiple occasions when you can leverage the benefits of using time travel. Imagine, you run a data pipeline that breaks the result table, or you did a mistake in your sql query that destroyed a table used by other team-mates. In all of these cases you can restore the table from the most recent state.

SELECT *
FROM `MY_DATASET.MY_TABLE`
  FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

here the SYSTEM_TIME defines the exact time stamp from which we would like to restore the state from, in this case it will be the point of time in 1 hours from now, the format would be “2022-11-07 10:44:28.409401 UTC”.

Another case when you would use the time travel more often is the troubleshooting. If you are wondering when a specific part of the data was introduced to the table, or how a specific subset of rows was changing over the last 7 days, you can leverage the following query:

SELECT
* except (_CHANGE_TYPE,_CHANGE_TIMESTAMP),
_CHANGE_TYPE AS change_type,
_CHANGE_TIMESTAMP AS change_time
FROM
APPENDS(TABLE MY_DATASET.MY_TABLE, TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY), NULL);

here APPENDS returns a table of all rows appended to a table for a given time range

APPENDS(
  TABLE table,
  start_timestamp DEFAULT NULL,
  end_timestamp DEFAULT NULL)

where

The limitations

1. Storage time

BigQuery can be configured to store from 2 up to 7 days of history. The default configuration is 7 days. In case if the data is not available anymore, you will get the following error

Invalid snapshot time 1601168925462 for table
MY_PROJECT:MY_DATASET.MY_TABLE@1601168925462. Cannot read before 1601573410026.

There are several occasions when you would like to have a longer time window. In those cases the BigQuery snapshots would be a better option.

2. Row Level Security

If you have tables with row level security configured, then you would need to have Admin priviliges to these tables to access the functionality of the time travel. Particularly, the roles:

on the table level are required.

Summary

The time travel is a powerful feature of the BigQuery, it might come in handy on emergency situations as well as in a daily Data Engineering routine. And as many other tools it requires some conciseness. It needs to be used wisely, since the storage and queries will affect the end cost of the usage.

Share this post