Optimise performance in Qlik Replicate

Qlik Replicate is in the middle of exchanging data from source to target. For this reason, we shouldn’t presume that Qlik Replicate is the root cause of an error or performance issue.

Thus if we want to tune our data replication process, we must break down a replication task into distinct components and analyse how they contribute to the overall replication performance.

For the sake of this discussion, let’s use the architecture below as an example.

Qlik Replicate use case - From Oracle to Kafka — *Replication from Oracle to Kafka*

In our example, the factors that contribute to the overall performance of the data replication process are:

Source database including table(s). Oracle in our architecture.
Network connection from the source database to the Replicate server and from the Replicate Server to the target database or Kafka.
Target database.
Qlik Replicate Server resources.
- Configure Qlik Replicate servers to handle all the tasks and loads it would be running.
Source and target database/ Kafka drivers or APIs.
Tuning Replicate Server tasks.

The scope of this post is to describe the performance optimisation best practices in a Qlik Replicate server from a general standpoint.

Qlik Replicate location

It is usually more efficient in terms of performance to keep Qlik Replicate close to the source. If several sources are in use, Qlik Replicate should be near the source sending more data.

Parallel Loads

You should initially set up the number of parallel segments equal to the number of cores in the Replicate server. Then you must tweak this number until you find the sweet spot, influenced by the workload, complexity, and available resources in the Replicate server.

The number of tables per task and impact on performance

There is no hard limit of the tables in a single Replicate task. In general, it is a good practice to limit it to 2,000 tables. However, some companies have up to 20,000 tables in a task in a particular project, which I do not recommend as it is hard to manage. It also negatively impacts performance and exhausts resources.

Tuning Replicate Server tasks

This tuning activity is core to optimising the Replicate server and is a regular exercise for a Replicate Developer or an Operations person.

There are scenarios where a replicate server has a large memory, and still, the tasks experience latency even for average volumes of data. The main factor could be that you need to adjust the performance settings at a task level.

You can adjust the following settings:

Full Load Tuning

You must configure the maximum number of tables considering how many SELECTs can run simultaneously on the source database and the bandwidth between the source database and the Replicate server.

A Full Load consumes resources in the source database. When it starts and runs can influence the number of resources available for other processes and queries that run at the source. If you need to run a vast load, don’t do it during the business window.

Use the Parallel Load when possible.

The commit rate can significantly impact targets where big data batches are written to a temporary place (e.g., memory or disk) before they land the final storage. Making fewer round trips in these cases is more efficient, so you must use bigger files.

Change Processing Tuning

Many factors determine the performance of Change Data Capture (CDC) and the speed of applying the changes to the target. In this section, we will discuss some key factors that impact CDC.

Volume and velocity of changes

Late ncy is usually heavily influenced by the number of changes that the source generates at a specific moment.

Even though Qlik Replicate can process many changes on a sustained basis, there are times when the source floods the logs. As a consequence, we observe latency.

It is, though, essential to consider latency during the design phase and set expectations accordingly.

For example, suppose a database has 100 tables, and only 10 tables have a very high volume that changes quickly. In that case, those ten tables must be in a separate task and tuned to avoid latency during peak volumes.

Method to read sources

Reading the changes in the source using Qlik Replicate proprietary method versus database-provided APIs. For example, if you use Replicate Log Reader in Oracle is always faster than LogMiner.

However, ODBC is always slower than using native database connections (e.g., APIs).

Transaction characteristics on the source

Replicate applies a transaction to the target only once it sees its commitment record. For example, if there are 20 million changes in a 30-minute timeframe, and then all of them are committed (at the 30th minute), Qlik Replicate will start storing these changes from memory to data folders based on the configuration. It will increase the latency as the page-to-disk operation is resource-intensive and more slow.

Target Apply Mode (change processing mode)

There are two modes: batch optimised apply and transaction apply. Use the batch-optimised mode to get a significant performance boost when writing those changes in bulk to the target. For instance, if there were 10,000 changes to 3 tables in 1000 transactions, the transactional apply will perform 10,000 operations to the target, while bach-optimised will apply all the same changes with a maximum of 1+3*3=10 operations, and a minimum of 1+3=4 operations if it uses a SQL MERGE).

However, there are scenarios where the batch apply mode is unavailable or irrelevant. For example, tables without PK, or in streaming sources/ targets like Kafka, etc. In these cases, you should use the transaction apply.

The high-level flow of processing a set of changes

In batch apply mode, Qlik Replicate performs three steps to process a batch of changes:

Keep collecting (accumulating) the incoming changes (one or more transaction – changes + commit = a batch).
Process the changes (simultaneously and in memory or disk).
Apply the changes.

Batch Tuning in Batch Apply Mode (step 1)

The “Longer than (seconds)” parameter (see image below) indicates how long you want to replicate and keep capturing the changes. A value above “Longer than (seconds)” decreases the frequency of the target changes. As a consequence, it increases the size of the batches.

“But less than (seconds)” value is the maximum amount of time to wait between each application of batch changes (before declaring a timeout). In other words, the maximum acceptable latency.

To ensure all available memory is NOT used by a specific task when there is a sudden change, we use the property ‘Force apply a batch when processing memory exceeds (MB)’. Even if Qlik Replicate keeps collecting changes, the size of these changes does NOT exceed this value.

Batch Tuning in Transaction Apply Mode (step 1)

Qlik Replicate applies changes one transaction after another in transaction commit order. For example, 1000 inserts + 200 updates + commit = 1 transaction.

Qlik Replicate collects the transactions in “batches” during a timeframe that we define in the “Maximum time to batch transactions before applying (seconds)” (see the image below).

Qlik Replicate applies the changes to the target either when the number of changes is equal to or greater than the “Minimum number of changes per transaction” or when the batch timeout value is reached, whichever occurs first. For example, if you set the “Minimum number of changes per transaction” to 1,000 and update 2,000 records in a single transaction, Qlik Replicate will process 2,000 records in one batch and then process the next batch.

Transaction Offload Tuning (step 2)

While Qlik Replicate collects the incoming changes, it simultaneously processes changes in memory for the interval defined by the “Transactions duration exceeds (seconds)”.

Qlik Replicate allocates a specific amount of memory to process the changes collected in step 1 (either “Batch Tuning in Batch Apply Mode” or “Batch Tuning in Transaction Apply Mode”). You define this amount of memory in the Total transactions memory size exceeds (MB)” (see below).

Note that Qlik replicate will continue processing the batch of data collected as part of step 1. However, it will offload changes to the disk if it meets either of these conditions, significantly increasing latency. It is particularly notorious for large batches, as Qlik replicate must page to disk continuously to process the changes.

Specific Optimisation Opportunities for Endpoints

Depending on the system we connect as a source or target, the specific endpoint for that connection may offer particular optimisation opportunities, such as the Chunk Size in the Salesforce endpoint.

Celia Muriel