Reconciliation Checks¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Reconciliation checks compare data between two tables, typically a source (such as a transactional database) and a target (such as a data warehouse), to confirm both contain the same information. Unlike regular data quality checks, which validate the health of a single table, reconciliation checks validate that two tables are consistent with each other.
Note
Reconciliation checks are limited to SQL Server, Oracle, Snowflake and Databricks only.
Use reconciliation checks when data is copied, replicated, or transformed between systems and you need confidence that nothing was lost, duplicated, or silently changed along the way.
Reconciliation checks are classified under the consistency data quality dimension. They validate that data remains consistent across systems by confirming source and target tables contain the same information.
Best Practices¶
Keep the following best practices in mind when configuring reconciliation checks:
Start with metric reconciliation on row count as a fast, low-cost sanity check before running deeper checks.
Run schema reconciliation before content reconciliation. A missing or mismatched column on one side makes row-level comparison unreliable.
Use source and target filters to restrict the comparison to a specific data slice, for example yesterday’s data, when working with large tables or near-real-time pipelines.
Schedule content reconciliation checks less frequently than metric checks. Content reconciliation reads both full tables and is more resource-intensive on very large datasets.
A passing check means the result is within your configured tolerance, not byte-identical. Set thresholds that reflect your actual business requirements.
Common Use Cases¶
The following table shows typical scenarios where reconciliation checks provide business value, along with the recommended check type and configuration approach for each.
Scenario |
Recommended Check Type |
Configuration Approach |
|---|---|---|
Validate row counts after a data migration to confirm no records were lost |
Metric reconciliation on |
Use Percent difference type with a threshold of |
Confirm schema parity between source and target before a system cutover |
Schema reconciliation |
Use zero tolerance. The check passes only when all columns match. |
Detect dropped or corrupted records introduced by an ETL pipeline |
Content reconciliation |
Use Percent difference type with a threshold of |
Validate that financial aggregates such as revenue totals match across systems |
Metric reconciliation on |
Use Percent difference type with a threshold of |
Audit PII or sensitive column completeness across environments |
Metric reconciliation on |
Use Absolute difference type with a threshold of |
Check Types¶
Alation Data Quality supports the following reconciliation check types. You can combine more than one type within a single source-target pairing.
Check Type |
What It Compares |
Best For |
|---|---|---|
Metric reconciliation |
A single aggregate value (row count, sum, average, min, max, or custom SQL) on both sides. |
High-level smoke tests, daily replication sanity checks, and financial total verification. |
Schema reconciliation |
Column names and data types between the two tables. The result is pass or fail only: either the structure lines up or it does not. |
Detecting schema drift before it breaks downstream pipelines or BI reports. |
Content reconciliation |
Actual row contents between source and target, using a bucket-based hash algorithm. |
Post-migration validation, compliance and audit requirements, and confirming pipeline output matches the source. |
Add a Reconciliation Check¶
Reconciliation checks are configured within the standard monitor creation workflow.
Note
Reconciliation checks require both the source and target data sources to be
cataloged and configured in Alation with appropriate service account credentials
and SELECT privileges on all tables being compared. All raw data remains within
each respective database.
Navigate to the Alation Data Quality application or catalog table page and click Add Monitor.
In the Select Scope section, choose the source data source and table.
In the Configure Checks section, click Add Check > Reconciliation.
In the Reconciliation Target panel, select the target data source from the Target Data Source dropdown.
Select the target table from the Target Table dropdown.
(Optional) Enter a Source Filter and a Target Filter to restrict the comparison to a specific data slice, for example
created_at >= '2026-01-01'. Filters apply to all reconciliation checks configured within this source-target pairing. Adding filters is recommended to avoid running queries against the full table, which can be expensive on large datasets.Configure one or more check types as described in the following steps.
To add a metric reconciliation check:
Click Metric > Add Metric Check.
Select a Metric: Row Count, Sum, Average, Min, Max, or Custom SQL.
For column-level metrics (Sum, Average, Min, Max), select a source column and its corresponding target column.
For Custom SQL, enter a Source SQL expression and a Target SQL expression. Each expression must return a single numeric value.
Select a comparison Operator (
=,<,<=,>,>=) and enter a Threshold value.Select a Difference Type: Absolute (the raw numeric difference between the two sides) or Percent (the difference as a percentage of the source value).
To add a schema reconciliation check:
Click Schema > Add Schema Reconcilation.
(Optional) In the Column Mapping table, add entries only for columns whose names differ between source and target. For example, if
nameon the source is calledfull_nameon the target. Columns with identical names are matched automatically. All columns present in either table are compared regardless of whether a mapping entry exists for them.(Optional) In the Type Equivalence section, add rules for type names that should be treated as equivalent across databases. For example,
bit=boolfor SQL Server-to-Snowflake comparisons.
To add a content reconciliation check:
Click Content.
In Key Columns, select the columns that uniquely identify each row, for example a primary key. Map each source column to its corresponding target column. This field is required.
In Comparison Columns, select the columns whose values will be compared between source and target. Only the values in these columns are evaluated, not the entire row. Map each source column to its corresponding target column. This field is required.
Select a comparison Operator and enter a Threshold to define the acceptable level of drift. For example, less than
1percent of rows may differ.Select a Difference Type: Absolute (number of differing rows) or Percent (proportion of differing rows).
Click Save.
Note
You can add multiple check types within a single reconciliation target pairing. For example, you can run schema, metric, and content checks against the same two tables in one monitor. Each check type runs independently and reports its own pass or fail result.
Reading Results¶
After a monitor runs, you can review reconciliation check results in the Check Results tab. The following sections describe how to interpret each check type.
Metric Reconciliation Results¶
A metric reconciliation result is a single side-by-side comparison. The result shows the following information:
Source value: The metric measured on the source side.
Target value: The same metric measured on the target side.
Difference: How far apart the two values are, reported as either an absolute number or a percentage of the source value, depending on your configuration.
Threshold: Your configured tolerance. The check passes when the difference satisfies the operator.
Note
A 0% difference in row count is not the same as a 0% difference in a column sum. Row count confirms that the same number of rows arrived. Column sum confirms that the same values arrived. Two tables can have identical row counts while containing meaningfully different data.
Schema Reconciliation Results¶
A schema reconciliation result is a structural diff between the two tables. The result is organized into the following four categories:
Matched: Columns present on both sides with compatible types. A healthy result has all columns here.
Type mismatches: Columns with the same name on both sides but incompatible types. For example,
amountisDECIMAL(18,2)on the source andVARCHAR(50)on the target. These are the most concerning finding because data still flows but values may be silently truncated, rounded, or coerced.Missing in target: Columns present on the source that are absent on the target. This often indicates the target schema is behind and downstream consumers may not receive expected fields.
Missing in source: Columns present on the target that are absent on the source. This often indicates leftover columns from a prior schema or unrelated additions on the target side.
A clean schema reconciliation shows all columns under Matched with the remaining three categories empty. Anything outside Matched is worth investigating before relying on downstream content checks. A column that does not exist on one side cannot be reconciled at the row level.
Note
Equivalent types across databases, for example Snowflake NUMBER and SQL
Server DECIMAL, or VARCHAR and TEXT, are treated as a match by
default. Built-in equivalence rules prevent harmless cross-database naming
differences from generating noise.
The following table shows the built-in type equivalences for SQL Server and Snowflake.
SQL Server |
Snowflake |
Equivalent |
|---|---|---|
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
Content Reconciliation Results¶
A content reconciliation result reports two distinct kinds of findings. Understanding the difference is key to interpreting the result correctly.
Presence drift describes rows that exist on one side but not the other. These counts are exact. If the result reports 50 rows missing in target, exactly 50 rows are missing. Presence drift covers the following scenarios:
Missing in target: Rows present on the source but absent on the target. This indicates lost inserts, failed replication, or incomplete loads.
Extra in target: Rows present on the target but absent on the source. This indicates stale data, deletes that never propagated, or double writes.
Content drift describes rows that exist on both sides under the same key but have different values in one or more columns. To stay efficient at scale, the check identifies regions of the table where differences exist rather than inspecting every individual row. Content drift provides the following information:
Upper bound: No more than this many rows can possibly differ. This is the total number of rows across all affected regions of the table.
Drilldown SQL: A ready-to-run query scoped to just the affected regions. Copy and run it in your SQL client to see the actual differing rows without writing the query yourself.
The following table shows how the check interprets the per-bucket hash comparison results. Each row in the table represents a distinct pattern the check can detect, helping you identify whether a discrepancy is caused by missing rows, key changes, or value-level changes within existing rows.
Row Count Match |
Key Hash Match |
Row Hash Match |
Diagnosis |
|---|---|---|---|
Yes |
Yes |
Yes |
Bucket is clean. No issues detected. |
No |
N/A |
N/A |
Presence issue: rows are missing or extra. |
Yes |
No |
N/A |
Presence issue: keys were swapped (a delete and insert occurred). |
Yes |
Yes |
No |
Content issue: the same rows exist but one or more values changed. |
Every content reconciliation result includes a one-line headline summary. For example:
“90.04% of data is byte-identical. 55 sectors have row-count drift, 47 sectors have content drift.”
The headline percentage represents the proportion of table regions where source and target are fully identical at the row level. A region is a bucket of rows grouped by key hash. When a region is clean, every row within it has the same key and the same values on both sides. When a region has drift, the drilldown SQL is scoped to that region so you can inspect only the affected rows.
Use the headline percentage to gauge severity at a glance using the following guidance.
Percent Clean |
Typical Interpretation |
|---|---|
100% |
Source and target are byte-identical. |
99 to 99.99% |
Trace drift. Usually in-flight rows, timezone edges, or rounding. |
95 to 99% |
Meaningful drift. Investigate before relying on the target. |
Less than 95% |
Significant disagreement. Treat the target as suspect until resolved. |
Resolve a Failed Reconciliation Check¶
Review presence drift first. Those counts are exact and usually point at a concrete pipeline problem such as a missed insert, a failed load, or a delete that never propagated to the target.
Review content drift. The upper bound tells you the maximum number of rows that could be affected. If that count exceeds your tolerance, run the drilldown query to identify the actual differing rows.
Run the drilldown query. The query is pre-generated and scoped to just the regions that disagree. Copy, paste, and run it in your SQL client to see the actual offending rows.
Check the headline percentage. If the result shows 99.9% clean and your pipeline is near-real-time, the drift may represent rows that had not yet arrived when the check ran. Re-run the check or narrow the filter window before escalating.
Cross-Database Normalization¶
To produce consistent hash values across different database engines, all column values are normalized before hashing. Without this normalization, MD5 hashes would never match even for logically identical data.
The following table describes the normalization rules applied to each data type.
Data Type |
Normalization Rule |
Example |
|---|---|---|
Strings |
|
|
Numerics |
|
|
Timestamps |
Convert to UTC, format as ISO 8601 |
|
Booleans |
Map to |
|
NULLs |
Replace with sentinel |
|
Empty strings |
Treat as NULL |
|
Normalized column values are concatenated with a | separator before hashing:
CONCAT(norm_key1, '|', norm_key2, '|', norm_val1, '|', norm_val2)
MD5 is used on both platforms. It is deterministic: the same input string produces
the same 128-bit hash regardless of the database engine. The first 16 hex characters
of the MD5 output are converted to a BIGINT for numeric aggregation.
Scale Guardrails¶
Alation Data Quality applies the following guardrails to reconciliation checks to protect query performance and warehouse compute costs.
Guardrail |
Behavior |
|---|---|
Query timeout |
Configurable per run. The default is 60 minutes. The check fails cleanly on timeout. |
Partition pruning |
All generated SQL includes the filter clause you define. For ongoing monitoring of large tables, filter to recent data only using the source and target filter fields. |
Early exit |
If all buckets match in the first step of content reconciliation, no drilldown SQL is generated and the check completes immediately. |
Concurrent run limit |
A maximum of one reconciliation run per data source pair runs at a time. |
Important Notes¶
Snapshots, not live views: A reconciliation result reflects the state of source and target at the moment the check ran. In near-real-time pipelines, a small amount of drift may represent in-flight data rather than a real problem. Re-run the check or add a time filter to confirm.
Filters apply to both sides independently: When a check is configured with a source filter and a target filter, each filter applies only to its respective side. Drift outside either filter window is not visible to this check by design.
NULLs are treated as values: Two rows with
NULLin the same column are considered equal. A row withNULLon one side and an empty string ('') on the other is a mismatch.Cross-database differences are normalized automatically: When source and target use different database technologies, the check normalizes boolean representations, timestamp timezones, decimal precision, and similar variations so that the same value stored differently does not appear as a mismatch. No additional configuration is required.
Cost scales with check type: Schema reconciliation is lightweight (a metadata query). Metric reconciliation runs one aggregate query per side. Content reconciliation reads both full tables and scales with data volume. Schedule content reconciliation less frequently on very large datasets.