Qlik Replicate allows restarting a Change Data Capture task (CDC) from a specific point in time. This feature let you capture all data changes as if the task has not run before – note that by default, when you restart a CDC task, it automatically copies all data changes since the last one it processed.
You may restart from a timestamp when recovering from an error. However, restarting a task from a timestamp should be used with caution as it may impact data consistency.
So to be on the same page, consistency ensures that any transaction will bring the target database from one valid state to another. It implies that any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination.
However, consistency does not guarantee that the transaction is correct in all possible ways the application programmer might have wanted. It is the responsibility of the developer to create the tasks in a way that ensure correctness.
Programming errors may not result in the violation of any defined business rules or requirements.
Coming back to restarting a task from a timestamp, we must do it in the exact moment that previously stopped so we are sure that there are no changes in the source, so we warranty consistency. To make it easier, it is better if you stop and restart the task in a moment when the source system has low activity.
If we restart a task from a timestamp after the moment it previously stopped, we lose the changes in data that occurred in the source. I.e. we don’t replicate them to the target. We lose consistency and create a data quality issue.
If we restart a task from a timestamp before the moment it previously stopped, we duplicate the changes in data that occurred in the source. I.e., we copy them twice. Again, we lose consistency and create a data quality issue.
To know when you stopped the task (t1), check the field SOURCE_TIMESTAMP_APPLIED in the Replication Status table.
On another note, when we restart a replication task, Replicate can do it in three different ways:
- If it can find the information in the source database of the eldest transaction still active, then Replicate starts reading here.
- If it can find the information in the source database of all open transactions, then Qlik Replicate waits for them to complete.
- Suppose it can’t proceed with any of the options above. In that case, Qlik Replicate waits an arbitrary amount of time (usually should be the same as the duration of the longest-running transaction expected).
In any of the three ways, Qlik Replicate ensures that when we see a commit record in the transaction log, it will be for a transaction that has the beginning recorded. Therefore Qlik Replicate will be able to apply it to the full-load data. I.e. Qlik Replicate will be able to apply the transaction to the corresponding record in the target.
Now, we can resume our CDC task from a timestamp. We click on Run [icon name=”arrow-right” prefix=”fas”] Advanced run options [icon name=”arrow-right” prefix=”fas”] Date and Time. Here we put the timestamp.
In brief, if you believe:
- There won’t be issues with error handling and PK enforcement in any of the tables of the task,
- You know when the tables were loaded for the last time, and
- You are positive there weren’t more changes in any table,
THEN you might start loading from a timestamp on the task.
Additionally, if you decide to load from a timestamp, you should do it when the data volume is low and the ecosystem is quiet.
Finally, bear in mind that restarting from a timestamp makes a dummy full load, and it replicates new metadata and Object ID – every table has its Object ID in Qlik Replicate’s metadata. If you create a table with the same name as an old one, it will have a different Object ID.
Note: You could restart from an Source Change Position depending on the source endpoint.
Leave a Reply