Uniqueness Checks

Uniqueness checks ensure there are no duplicate values in columns where uniqueness is required, such as primary keys, unique identifiers, and transaction IDs.

Follow these best practices for uniqueness checks:

  • Apply to all primary key columns to validate database constraints.

  • Use percentage-based thresholds for columns where some duplication is acceptable.

  • Combine with completeness checks for columns that should be both unique and non-null.

  • Consider case sensitivity for text-based unique identifiers

Consider the following components for a uniqueness check:

  • Metric Type: The uniqueness measurement method (duplicate_count or duplicate_percent).

  • Target Column(s): Single or multiple columns being evaluated for uniqueness.

  • Threshold: Acceptable level of duplication (typically zero for strict uniqueness).

  • Scope: Whether to check individual columns or composite key combinations.

  • Case Sensitivity: Handle text case variations in uniqueness evaluation.

  • Temporal Context: Time-based filtering for incremental uniqueness validation.

Available Metrics

Metric

Description

Supported Data Types

Best Used For

duplicate_count

The count of distinct values that have duplicates. Multiple column names can be specified to count duplicate sets of values

number, text, time

Absolute duplicate detection, small datasets, zero-tolerance scenarios

duplicate_percent

duplicate_count over the total row count, expressed as a percentage

number, text, time

Large datasets, tolerance-based validation, trend analysis

Configuration Examples

Check Description

Configuration Steps

Example Check

Ensure no duplicate customer IDs

  1. Select customer_id column.

  2. Choose Duplicate Count check.

  3. Select = operator.

  4. Set threshold to 0.

duplicate_count(customer_id) = 0

Validate low percentage of duplicate email addresses

  1. Select email column.

  2. Choose Duplicate Percentage check.

  3. Select <= operator.

  4. Set threshold to 0.1.

duplicate_percent(email) <= 0.1

Common Uniqueness Check Types

Primary Key Uniqueness

-- Customer ID uniqueness (zero tolerance)
duplicate_count(customer_id) = 0

-- Transaction ID uniqueness
duplicate_count(transaction_id) = 0

-- Order number uniqueness
duplicate_count(order_number) = 0

Business Identifier Uniqueness

-- Product SKU uniqueness
duplicate_count(product_sku) = 0

-- Employee ID uniqueness
duplicate_count(employee_id) = 0

-- Account number uniqueness
duplicate_count(account_number) = 0

Contact Information Uniqueness

-- Email address uniqueness (low tolerance)
duplicate_percent(email_address) <= 1%

-- Phone number uniqueness (moderate tolerance)
duplicate_percent(phone_number) <= 2%

-- Social Security Number uniqueness (zero tolerance)
duplicate_count(ssn) = 0

Composite Key Uniqueness

-- Order line uniqueness (order + line number)
duplicate_count(order_id, line_number) = 0

-- Employee position uniqueness (employee + department)
duplicate_count(employee_id, department_code) = 0

-- Product variant uniqueness (product + variant)
duplicate_count(product_id, variant_code) = 0

Conditional Uniqueness

-- Active product SKU uniqueness only
duplicate_count(product_sku) = 0:
filter: product_status = 'ACTIVE'

-- Current employee ID uniqueness
duplicate_count(employee_id) = 0:
filter: employment_status = 'ACTIVE'

-- Annual invoice number uniqueness
duplicate_count(invoice_number) = 0:
filter: YEAR(invoice_date) = 2024

Case-Sensitive vs Case-Insensitive Uniqueness

-- Case-sensitive username uniqueness
duplicate_count(username) = 0

-- Case-insensitive email uniqueness (requires custom SQL)
duplicate_count(LOWER(email_address)) = 0

-- Case-insensitive product code uniqueness
duplicate_count(UPPER(product_code)) = 0