VantageCloud Lake manages the tables added to OFS through an internal versioning method. This versioning is important for two reasons:
- It supports Time Travel, which is the ability to access the data in a table as it looked at a previous moment, and
- It allows transactionally consistent reading while there are ongoing write operations.
This post explains the Time Travel feature and the index structure within OFS used to maintain versioning.
Incidentally, you have a summary of the critical VantageCloud Lake elements and the basis for using them in the post VantageCloud Lake in a Nutshell (it includes an infographic).
Time Travel
As I pointed out, the Time Travel feature allows us to query the database based on how the data looked at some point in the past. So, if there were problems with a data load, you can return to when the data was clean or consistent. Therefore, it is a backup method in Lake.
Another benefit of Time Travel is that, if you did an analysis a few weeks ago, you could re-run it, keeping the inputs the same for one or more tables.
To use Time Travel in OFS, you must use the syntax SELECT AS OF TIMESTAMP or VERSION_NUMBER.
However, Lake doesn’t keep versions of tables indefinitely. They depend on the instance’s default retention period, which I will discuss in the next section.
Retention Periods
As previously mentioned, Lake doesn’t keep objects’ versions in OFS indefinitely; a purge process removes outdated versions of tables, according to a retention policy.
By default, the retention period is 30 days. However, you can request Teradata to extend the retention period to 90 days.
Note that if you extend the retention period, you will benefit from more depth to Time Travel. However, you will also use more storage, increasing storage costs.
Implementing Versioning for OFS Tables
The diagram below explains how the root-leaf index structure supports versioning when a single row is updated. As you can see, an update to a single row requires the deletion of the old row and the insertion of a new row. Consequently, Lake will create two objects: one for the object that didn’t change and the other for the updated row.
I updated this post with the link to the post VantageCloud Lake in a Nutshell on 10 December 2023.
Leave a Reply