Teradata VantageCloud Lake - Considerations to Load Data

Considerations To Load Data Into VantageCloud Lake

As you could read in a previous post about VantageCloud Lake architecture, it strives to optimise costs and provide good performance while leveraging cloud-native features. Consequently, it has two proprietary file systems, Block File System (BFS) and Object File System (OFS), where you can store and read data.

Incidentally, you have a summary of the critical VantageCloud Lake elements and the basis for using them in the post VantageCloud Lake in a Nutshell (it includes an infographic).

In this post, I explain several aspects about Lake that impact how you should load data, and I make some bold recommendations about where you should create your tables.

Coordinated Universal Time (UTC) time zone

All Lake environments are in the Coordinated Universal Time (UTC) time zone:

  • To standardise administration, and
  • To prevent issues related to Daylight Savings Time or other time zone manipulation requirements in various regions.

Such a configuration impacts organisations looking to migrate data and workloads from an existing Vantage environment where the system time zone configuration is currently set to something other than UTC.

Additionally, and independently if you migrate data from another Teradata database to Lake, you should keep in mind that Lake stores data in UTC for regular operations, data load (if your source data is in another time zone) and data ingestion (if you want to present data to users in their local time).

Migration from Enterprise to Lake

When you migrate to Lake, the source system must have a UTC internal time zone. In the case of Enterprise as a source system, the time zone is defined in DBS Control or Linux across all nodes. Note that you can also specify other time zones at the field level.

With UTC defined at the system level, Lake doesn’t automatically switch from Winter to Summer Time and vice versa.

If your source Enterprise database has TimeDateWZControl = 2 (DBS Control, General parameter 57), it automatically converts any timestamp to UTC before writing it to disk. Thus, you wouldn’t need to transform the timestamps to the UTC zone during the migration.

For any other value of TimeDateWZControl, you must adjust the time zone to UTC. Notably, you can’t change TimeDateWZControl if it is not 2, as you would affect the timestamps in your database.

Mind you, the parameter TimeDateWZControl has been available since Teradata 13.10.

Load Data which is in Local Time Zones

If you need to load files to Lake in a local time zone:

  • You must define the time zone at the session level if the load solution allows it, or
  • You must generate the source data in UTC.

The following solutions allow you to include the source time zone in their jobs at the session level:

  • Data Copy – which is an integrated service in Lake to transfer data from different Vantage systems onto Lake -, or
  • Teradata Tools and Utilities (TTU).

Where To Place Tables in Lake

As a rule of thumb, you will create all tables in OFS, especially if they are large, because it is cheaper storage.

Also note that OFS is ACID-compliant, i.e. it supports any table where you run ACID (atomicity, consistency, isolation, durability) operations.

However, you may need to place a table in BFS to achieve a better performance than when you run in OFS, for example, when you execute tactical queries.

Furthermore, BFS includes several features you won’t find in OFS, such as temporal tables, tables where we have implemented row-level security, Security Zones and Referential Integrity (review Teradata documentation to check what features you have in BFS and OFS). So, if you want to use these features, you must create the table in BFS.

Similarly to OFS, BFS allows ACID (atomicity, consistency, isolation, durability) transactions.

Depending on cost and performance factors, you may choose one file system or another for some tables.

How to Load Data in OFS

To load data into OFS tables, you can use any of the techniques and solutions below:

  1. “INSERT SELECT” and “CREATE TABLES AS” from BFS tables and NOS files through the NOS_READ function.
  2. Data Copy.
  3. QueryGrid.

It is essential to realise that Teradata doesn’t support TPT to load into objects of an OFS table.

Conclusion

The fact that Lake is always in UTC is a crucial factor you should always keep in mind when loading, migrating or transferring data in any other way. If your source data is in any different time zone, you should transform it using any of the above methods.

Separately, it is of paramount importance where you create your tables, as you must balance a cost-effective system with the performance your business requires. OFS is the cheaper file system, and BFS is faster. However, some techniques are available to optimise access to OFS tables, such as creating Single-Table Join Indexes in BFS.


I updated this post with the link to the post VantageCloud Lake in a Nutshell on 10 December 2023.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *