Flow in VantageCloud Lake: From Bust to Boom in Data Ingestion to Insights

Share this post

Flow is a VantageCloud Lake service that allows data users to upload external files into Lake quickly and easily.

Flow paired with the Visualization feature democratises getting quick insights into any information in Lake. See the video below for a demo.

VantageCloud Lake: Flow + Visualization demo

How Flow works

Flow is a self-service, low-code/ no-code utility. It lets data teams get started with Lake analytics quickly and easily by lowering the complexity of onboarding data from Native Object Storage into BFS and OFS tables using Primary Cluster‘s resources.

Flow - VantageCloud Lake — *Flow architecture*

You can use Flow through the Lake Console GUI, which allows you to create, configure and monitor Flow ingest jobs (called “flows”).

An important characteristic of Flow is that it does not require users to write scripts, code jobs, or maintain ETL servers. It also creates the target table if it doesn’t exist.

On a separate note, Flow allows up to twenty simultaneously running “flows”, each of which can ingest up to five data sources.

In fact, Flow is a data ingestion tool and can complement ETL or ELT patterns.

ETL, where the transformations happen on data that is in Native Object Storage, a common pattern in the Cloud.
ELT, where the transformations take place in the database. Flow facilitates the data movement in this case, while tools such as dbt + Airflow perform transformations.

Aside from the above, Flow allows to load:

One-time batches,
Continuous loads, i.e., stream mode, which checks for new data at a time interval that the user provides (e.g., every 1 minute) and brings in only the new data.
Automated scheduled jobs, which enables users to define a time window for when Flow will check for new data and only bring in the new data that arrived at the Native Object Storage bucket since the previous scheduled load.

Note that Flow automatically detects if there is a new file in the bucket and uploads it in Continuous and Scheduled mode. It checks based on the user-defined interval (continuous) or schedule.

The one-time flows can also find if there is a new file in the AWS S3 bucket. It happens when you manually run them again, i.e. you execute the Run operation.

Get Started with Flow

Teradata offers online documentation about Flow. In addition, I wrote a cookbook to make Flow work for the first time on AWS, where I gathered my lessons learnt on:

How to grant Flow permissions on an AWS S3 bucket (a one-time task), and
How to create a flow (an ad hoc task).

Incidentally, I gave some tips in a previous post on loading data in Lake. This post doesn’t focus on Flow, but it considers several solutions you have available.

Share this post

Flow in VantageCloud Lake: From Bust to Boom in Data Ingestion to Insights

How Flow works

Get Started with Flow

Related

Leave a ReplyCancel reply