This post explains how to monitor and alert about any possible issue when CDCing data with Qlik Replicate efficiently.
Category: Data Integration
As the data ecosystems evolved to allow for more functionality and flexibility and new technologies, Data Integration became more and more critical to enabling all the possibilities.
Data Integration is the means to eliminate data silos and leverage all the data’s potential for an organization.
For clarification, the Data Ingestion is comprised of Data Acquisition, Data Landing and Data Preparation.
First, Acquisition is the method of extracting or receiving a defined data set before further consumption. The Acquisition is how we obtain data and what characteristics the data has.
Landing is the act of unloading a data set on a target database or storage system.
As for Data Preparation, it is the process to make the data ready for use.
Ideally, we keep data as-is when we move data. I.e., we first acquire data. Once it has landed in the target, we can prepare the data for consumption. Thus, we eliminate the business and technical problems that inaccurate, contradictory and inconsistent data causes.
However, some scenarios require some minimum transformations to allow the ingestion from a technical point of view, such as formatting a date (15.3.22) to unify all formats (15/3/2022). In some other cases, we prefer to include them to ensure a certain level of data quality when the data reaches the target (e.g., if we receive records as “Vanilla Inc”, “vanila”, or “vanilla In”, we may unify them to reduce duplication).
In this category, you will find how-to articles, tips, tricks, best practices and use cases for database migrations, replications, data copies, ETL/ ELT, streaming, CDC (Change Data Capture). They focus on Data Acquisition. I only include a Data Preparation process when necessary to achieve the final result.
Fuzzy Matching Demo: Inconsistent Company Names
Fuzzy matching use case. We improve data quality when loading data into BigQuery using Trifacta software, which simplifies the process.
Fuzzy Matching or Approximate String Matching
Fuzzy matching is a technique used to match text strings that may be less than 100% perfect. We use it in web searches, data quality, etc.
Qlik Replicate: Restart from a Source Change Position
This post explains how to restart a Qlik Replicate CDC task from a specific System Change Number (SCN) when you need to recover from an error.
Qlik Replicate: Restart from a timestamp
This post explains how to restart a Qlik Replicate CDC task from a specific point in time when you need to recover from an error.
A brief introduction to BigQuery’s architecture
This post helps to understand BigQuery’s internal design and how efficiently to load, replicate or migrate data into it.