Data Integrity Package

Ensuring data integrity is essential before conducting any further analysis on the data. This package provides calculations that offer insights into the availability of data within a workspace.

Usage and Configuration

To perform the calculations run the workflows from this package starting with the workflow Data Integrity Entity Level. The schedules may be adapted to use different contexts.

Make sure to use a scope increment that aligns with the increment of the data in the workspace. Otherwise, the reported percentages won't be correct.

E.g. if the workspace holds monthly data, run the workflows with a monthly scope increment.

Follow the steps given in How do I configure packages pulled from Datagration's package repository to change the configuration of the package.

Use the workspace value DataIntegritySignals to select the signals that should be analyzed by the workflows.

Use the workspace value DataLoadOffsetDays to configure the end of the scopes Start To Date Daily and Start To Date Monthly.

E.g. If the data in the data sources is lagging one day, set the workspace value to 1 to account for this offset.

Use the workspace value DataIntegrityImportNotificationsEntitySets to specify the entity sets for which data import issues should be detected and notifications created.

Use the workspace value DataIntegrityImportNotificationsReportingEntity to specify the entity on which the notification content shall be stored. It is recommended to select the workspace's root entity.

Details

Workflow Data Integrity Entity Level

This workflow executes a corresponding script that retrieves the first and last dates when the specified signal was detected. Using this date range, it calculates the total number of steps based on the defined scope increment. Additionally, it counts the time steps with available data and computes the signal's availability as a percentage. The resulting metrics are stored in the reference table Data Integrity Per Entity and are displayed on the Data Integrity dashboard.

Workflow Data Integrity Parent Level

This workflow executes a corresponding script that calculates the total number of child entities associated with the processed entity. It also verifies whether each child has at least one available data point for the specified signal. The results, including the total number of children, the number of children with data, and the percentage of data availability, are recorded in the reference table Data Integrity Per Parent and are displayed on the Data Integrity dashboard.

Workflow Data Integrity Notifications

This workflow tracks data import problems by monitoring signals that have not been available for a user-defined number of days. When an issue is detected, it generates a log entry that outlines the details of the problem. This log entry can then be utilized to send notifications to external applications, such as Microsoft Teams.

By default, the workflow is set up to use a daily increment and 5 steps for monitoring data issues. These settings can be changed using the package configuration (tab Workflows).

To create notifications in Microsoft Teams follow the steps lined out in Notifications and Notifications - MS Teams Webhook Connection. Choose the event On Log Messages Added and filter for content "*stopped being reported for the following entities*".

If a significant number of issues are identified, the process of sending notifications to external tools may encounter limitations, as the size of the notification string and the associated data for transfer could become excessively large.

Data Integrity Dashboard

The package contains a Dashboard that can be used to view the results of the analysis.

Alternatively navigate directly to the reference tables to view the data.