This post describes the good practices on actions your Operations team must execute to monitor and alert when convenient about Qlik Replicate tasks.
As you already know, Qlik Replicate is a general-purpose data movement tool deployed to design solutions for various use cases involving technologies and data platforms spanning different hosts, Operating Systems and deployment architectures. It allows to load and stream data from a source storage to a target one. Additionally, it captures data changes and DDL (Data Definition Language) transformations from the source and replicates them to the target.
Qlik Enterprise Manager centralises the Replicate operations management. It allows to control, monitoring and alert on Replicate tasks and workflows in multiple Replicate servers.
High-level architecture
The diagram below shows the high-level architecture that we will use as an example in this article.
As the ecosystem evolves, the diagram should be updated and kept handy for your Operations team to monitor and alert the ecosystem. Remember that this post explains how to monitor and alert on Qlik Replicate tasks only. It does not make recommendations about other elements of the ecosystem.
Use the appropriate architecture for your scenario.
Notifications and alerts
The easiest way to keep track of any issue during the replication is to set up notifications in Replicate or the Enterprise Manager.
You have all the details about messages and alerts in the Enterprise Manager documentation. Additionally, you can use the Replicate notifications.
In this section, we explain when and how to use each method.
Windows Event Log
Qlik recommends setting up Windows Event Log alerts in the Enterprise Manager for task and data errors.
You can set up alerts at server and task levels.
You can find the Event IDs sent to Event Log here.
There is an example below of setting up an alert and configuring it to send e-mails. You do it from the Enterprise Manager portal.
syslog
The Replicate Server is on Linux in this case, as seen in the High-level architecture section. If you have a tool to view syslog messages, use Replicate notifications.
The Replicate notifications are set up at the server level. You can define notifications for tasks or server events.
You can see below an example.
e-mail alerts
Qlik recommends setting up e-mail alerts for task and data errors in Enterprise Manager or Replicate.
In the examples in the Windows Event Log and syslog sections, you could see how to add e-mail alerts to your notifications.
Message Center
The Message Center in the Enterprise Manager also allows error messages and notifications to be reviewed.
Open the Enterprise Manager and click on the arrow on the “Message Center” box at the bottom left corner to access the Message Centre.
The Message Center is on the half-bottom screen. You can expand it to make it full-screen.
The information displayed is:
- Error Messages: The system will automatically recover from some errors. However, you must take action on the non-recoverable errors. Additionally, look for patterns like connection loss at the same time each day, etc.
- Warning: There are usually no actions to be taken for a warning message. They show information about the data or the state of the data. E.g., if you stop the Log Stream task, the secondary task will issue the warning “Warning no data capture” until you restart the Log Stream task.
- Notifications: If you think you have missed a notification, check here.
- Informational: When the alerts return to normal or a Full Load task completes, they are flagged as informational messages.
Scripting for monitoring and reporting
Qlik Enterprise Manager can be accessed through .NET, REST or Python APIs. These APIs allow the automation of tasks’ operational aspects and the review of trends, including performance and resource data, back in time for a broader perspective and comparison analysis of performance issues.
If you have a centralised solution to monitor all systems, applications and ecosystems in your company (e.g., Splunk, a cloud console, etc.), you can include Qlik Replicate with any of these APIs.
Most Common Examples of Actions Based on Notifications
Latency Notification
If a task has high latency, you must first determine what workflow element causes the delay.
Open the Replicate Console.
Open the task that is suffering the latency, and click on Monitor.
Click on Change processing –> Incoming Changes.
If the first two columns are in blue, both In-Memory and On-Disk, the source system is causing the delay. Qlik Replicate is waiting on the source to commit the transactions.
If the second two columns are in blue, both In-Memory and On-Disk, the target system is causing the delay. Qlik Replicate is waiting on the target to apply the changes.
Once you have determined where the latency originates, set the logging level up on the appropriate component.
- If the source is causing the delay, set the logging to the appropriate trace level on SOURCE_CAPTURE and SORTER.
- If the target is causing the delay, set the logging to the appropriate trace level on TARGET_APPLY.
Review the new logs.
- If the last read commit point in the source is different from the current time.
- What statement is taking longer in the target.
Non-Recoverable Error
First, look for the error message in the task log.
If the message is easy to understand, like “Wrong password”, etc., correct the issue and resume the task.
If the error message is a timeout or difficult to interpret, resume the task. If it was a temporary issue, the task should continue fine. If it doesn’t restart, you will need to investigate further.
Suspended Tables
A “suspended table” error occurs when there is an issue with source or target systems.
For example, if the source endpoint can’t confirm if the transactions were continuously logged in a table, there is potentially a gap in the logging.
Another example is when a table can’t grow anymore in the target, i.e. there is no space for that table to store more data.
The best way to recover from a “suspended table” error is to reload the offending table.
However, on some occasions, you can’t reload, especially for huge tables. In these cases, there is another procedure. Be cautious on applying this second option as if not done carefully (and when there is little activity in the source system), you risk losing data. This second way is as follows:
- Put the table in a separate task just for it,
- Start the new task from the moment before the table was suspended,
- Bring the table to the concurrent state,
- Stop the main task, and
- Start the main task from the moment before you stop it.
See the “Control Tables” section.
Control Tables
Qlik Replicate provides a set of Control Tables. They contain Replicate metadata about the replication tasks and their statistics.
To enable the Control Tables, open the task you want to monitor, and click on “Task Settings”.
Then select “Control Tables” in the “Metadata” section. Define where to create the control tables and the frequency to store control metadata. By default, the tasks always collect Apply Exceptions. You can select the other control tables you want to populate, e.g. Suspended Tables, and click OK.
When we enable the Control Tables for a replication task, Qlik Replicate will create the Control Tables. As we enable the Control Tables in other tasks, Qlik Replicate will append data to the existing Control Tables.
In the latest Qlik Replicate versions (at the time of writing this post, the newest version was 2021.5), Qlik usually recommends connecting the Suspended Tables only unless Qlik advises you otherwise in a particular scenario. You will find the historical metadata about the execution and status of the tasks in the Qlik Enterprise Manager analytics.
You can set up a notification in Enterprise Manager (see the “Notification” section) when a table replication is suspended due to an error (see the “Suspended Tables” section). If you receive this notification, you can check the attrep_suspended_tables table. Once you fix the issue, you can delete the entry in the attrep_suspended_tables table.
Reporting
Replicate tasks, and the task screens in the Enterprise Manager, show how the task is currently performing.
Qlik Enterprise Manager stores historical information about Qlik Replicate tasks and workloads in a Postgres Analytics database. This database allows the Enterprise Manager to provide reporting functionality to see the historical trend.
The analytics information is in the Analytics tab.
Select the chart you wish. It will show the history of the resource.
The charts show the actual values and the average of the following resources:
- Memory
- Disk Usage
- Replicate CPU
- Machine CPU
- Tack CPU
- Apply Throughput
- Tables
- Records
- Apply Changes
The charts allow you to drill down to the information you are after and choose filters and time ranges to limit the information displayed in the charts.
Additionally, you can create your reports in Qlik Sense or any other Business Intelligence tool by reading the PostgreSQL database within the Enterprise Manager server that stores the operations information and metadata. Here you have an example to make Qlik Enterprise Manager Analytics in Qlik Sense. If you want to use this example in a scenario where you are replicating data into Qlik Sense SaaS with the new Hybrid Data Delivery service, you must:
- Fulfil the PostgreSQL as a source requirement: Server-side ‒ Qlik Replicate including:
- PostgreSQL drivers on Qlik Replicate server, and
- Allow replication in PostgreSQL configuration.
- Set up a Qlik Replicate task to push the QEM Analytics database to Qlik Cloud Landing.
- Create landing and storage assets in Qlik Sense SaaS.
- Update LOADPREFIX and QVDLOCATION variables to point to the files in the storage area. Add .public to LOADPREFIX. Remove the trailing _ from PREFIX so the variables look like the ones below:
SET LOADPREFIX=”QEM Analytics Storage/public.”;
SET QVDLOCATION=”lib://Data Space:DataFiles/”;
SET PREFIX= ‘$(QVDLOCATION)$(LOADPREFIX)’;
Look outside Qlik Replicate
Because Qlik Replicate is in the middle of exchanging data from source to target and reporting an error or performance issue, it should not be presumed to be the root cause of the problem. Still, the error messages and warnings provided by Qlik Replicate should help lead in the right direction with a probable cause to one or more of the several pieces of the puzzle.
Additionally, you could see that some of the “Most Common Examples of Actions Based on Notifications” section could fix something in the source or target systems.
Finally, it is an excellent practice to implement a Data Reconciliation methodology as part of the operations and monitoring of the replication process. There are no specific reports for Data Reconciliation in Qlik Replicate. It depends on the source and target systems, their data lifecycle, and how you replicate the data to reconcile it. Here you have a post with several considerations to help you choose your Data Reconciliation procedure.
Leave a Reply