I have extensively worked in Data Replication in the last few imonths, and I want to write several posts about my experience and learning. But first, I’ll introduce Data Replication here and the solution I work with: Qlik Replicate.

Qlik bought Attunity in May 2019, and they renamed Attunity’s portfolio Qlik Data Integration (QDI).

Qlik Data Integration QDI portfolio
Qlik Data Integration (QDI) portfolio

There are three products in the QDI portfolio that we can use independently:

  1. Qlik Replicate for Data Replication. I will elaborate further below.
    1. Qlik Enterprise Manager is a separate component that monitors and tracks activity on Replicate and Compose.
  2. Qlik Compose assembles and provisions data automatically to fulfil business needs, for example, throw Analytics (with BI tools as Qlik Sense).
  3. Qlik Catalog is a data catalog management solution.

On a separate note, Data Replication is the process to send data from a source storage system to a target one just as it is, without any modification. The goal is to capture data once and use it many times. The motivation is that if you store transformed data on the target and requirements change, you need to get the data again. It could start a slippery slope of many versions of the truth.

Once you replicate data, you transform and consume the data in the target as per your needs. For example:

  • Analytics (reporting with a BI tool as Qlik Sense),
  • Input to Machine Learning models,
  • Improve data availability and accessibility,
  • Enhance system resilience and reliability, or
  • Database migrations. E.g., from on-prem to cloud, from SAP to BigQuery, etc.

I have been using Qlik Replicate to do Data Replication. Qlik Replicate is not an ETL (Extract-Transform-Load) tool, as its objective is to copy the data as it is.

Qlik Replicate functionality
Qlik Replicate functionality

Qlik Replicate moves data from approximately 40 different sources (including databases, files, mainframe, SAP and Salesforce) to one or several targets with the same or different flavours than the source. It supports +60 targets.

When Qlik Replicate starts loading data, it checks if the table exists in the destination, and if it doesn’t, it creates it.

Qlik Replicate can load data in the following modes:

  1. Full Load to make an identical copy,
  2. Change Data Capture (CDC) to keep the data up to date in real-time, and
  3. A combination of both.

Afterwards, if you need to transform your replicated data as per your needs, you can use several tools:

  • A tool to build an analytical model as Qlik Compose.
  • ETL/ ELT Solutions as Google Dataflow (Apache Beam), Informatica, Azure Data Factory, etc.
  • Data Wrangling Tools as Trifacta.
  • Scripting, etc.

Change Data Capture: a way of streaming data

Change data capture (CDC) refers to the process or technology for identifying and capturing changes made to a database. Qlik Replicate can apply those changes to another data repository or make them available in a format consumable by ETL, EAI (Enterprise Application Integration), or other types of data integration tools.

When you load data in CDC, any insert, update, and delete activity on the source is immediately replicated into the target.

CDC Change Data Capture
Change Data Capture (CDC)

Uses Cases

I show in this section several data replication use cases.

Feed a Data Warehouse

You can load a Data Warehouse with different sources (databases, files, SAP, Salesforce, etc.) with Data Replication. If your solution allows doing Full Load and CDC as Qlik Replicate, you can make:

  1. Load data in real-time with CDC, and
  2. Make an initial synchronisation of a data set with a full load, or make a backup copy in another system for high availability or improve data access.

You can keep a hybrid architecture and use Qlik Replicate to integrate the different elements, no matter where they are.

For example, you can feed your data warehouse hosted on BigQuery in real-time with data from:

  • SAP HANA on GCP
  • SAP-Oracle on-prem
  • SQL Server on-prem
  • Salesforce in a Salesforce-managed data centre
Qlik Replicate use case - Data Warehouse
Qlik Replicate – Feed a data warehouse in the cloud

Continuing with my example, you could use data on BigQuery for:

  • Reporting with Qlik Sense
  • BigQuery ML
  • Feeding Cloud SQL (MySQL), etc.

Send Information to an IoT Systems

Qlik Replicate can CDC data to a Kafka pipeline. Then Kafka can message an IoT device (e.g., change configuration) or feed a database or a Hadoop system.

Qlik Replicate use case - Kafka
Qlik Replicate – Send data to an IoT device

Backup Data to the Cloud

You can replicate data to the cloud to back it up. You can store files you want to keep in AWS S3, Azure Data BLOB or Cloud Storage or upload the data to a database.

Qlik Replicate use case - Backup to the cloud
Qlik Replicate – Backup data to the cloud

Migration

When we discussed migrations, we said that a database contains data, objects’ structure to keep the data (tables, documents, etc.), other objects, security rules, etc.

Qlik Replicate can:

  1. Automatically create the objects with the right data types in the target, and
  2. Migrate data to a new database (e.g., from Oracle to BigQuery).

You will need to take further steps to migrate the rest of your ecosystem as per your requirements.

Useful links to work with QDI

General Information

Training Websites

Certifications & Qualifications

Qlik Data Integration portfolio

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *