Uniqueness Checks¶
Uniqueness checks ensure there are no duplicate values in columns where uniqueness is required, such as primary keys, unique identifiers, and transaction IDs.
Follow these best practices for uniqueness checks:
Apply to all primary key columns to validate database constraints.
Use percentage-based thresholds for columns where some duplication is acceptable.
Combine with completeness checks for columns that should be both unique and non-null.
Consider case sensitivity for text-based unique identifiers
Consider the following components for a uniqueness check:
Metric Type: The uniqueness measurement method (duplicate_count or duplicate_percent).
Target Column(s): Single or multiple columns being evaluated for uniqueness.
Threshold: Acceptable level of duplication (typically zero for strict uniqueness).
Scope: Whether to check individual columns or composite key combinations.
Case Sensitivity: Handle text case variations in uniqueness evaluation.
Temporal Context: Time-based filtering for incremental uniqueness validation.
Available Metrics¶
Metric |
Description |
Supported Data Types |
Best Used For |
---|---|---|---|
duplicate_count |
The count of distinct values that have duplicates. Multiple column names can be specified to count duplicate sets of values |
number, text, time |
Absolute duplicate detection, small datasets, zero-tolerance scenarios |
duplicate_percent |
duplicate_count over the total row count, expressed as a percentage |
number, text, time |
Large datasets, tolerance-based validation, trend analysis |
Configuration Examples¶
Check Description |
Configuration Steps |
Example Check |
---|---|---|
Ensure no duplicate customer IDs |
|
|
Validate low percentage of duplicate email addresses |
|
|
Common Uniqueness Check Types¶
Primary Key Uniqueness¶
-- Customer ID uniqueness (zero tolerance)
duplicate_count(customer_id) = 0
-- Transaction ID uniqueness
duplicate_count(transaction_id) = 0
-- Order number uniqueness
duplicate_count(order_number) = 0
Business Identifier Uniqueness¶
-- Product SKU uniqueness
duplicate_count(product_sku) = 0
-- Employee ID uniqueness
duplicate_count(employee_id) = 0
-- Account number uniqueness
duplicate_count(account_number) = 0
Contact Information Uniqueness¶
-- Email address uniqueness (low tolerance)
duplicate_percent(email_address) <= 1%
-- Phone number uniqueness (moderate tolerance)
duplicate_percent(phone_number) <= 2%
-- Social Security Number uniqueness (zero tolerance)
duplicate_count(ssn) = 0
Composite Key Uniqueness¶
-- Order line uniqueness (order + line number)
duplicate_count(order_id, line_number) = 0
-- Employee position uniqueness (employee + department)
duplicate_count(employee_id, department_code) = 0
-- Product variant uniqueness (product + variant)
duplicate_count(product_id, variant_code) = 0
Conditional Uniqueness¶
-- Active product SKU uniqueness only
duplicate_count(product_sku) = 0:
filter: product_status = 'ACTIVE'
-- Current employee ID uniqueness
duplicate_count(employee_id) = 0:
filter: employment_status = 'ACTIVE'
-- Annual invoice number uniqueness
duplicate_count(invoice_number) = 0:
filter: YEAR(invoice_date) = 2024
Case-Sensitive vs Case-Insensitive Uniqueness¶
-- Case-sensitive username uniqueness
duplicate_count(username) = 0
-- Case-insensitive email uniqueness (requires custom SQL)
duplicate_count(LOWER(email_address)) = 0
-- Case-insensitive product code uniqueness
duplicate_count(UPPER(product_code)) = 0