Completeness Checks

Completeness checks verify that required data is present and not missing (NULL). These are critical for ensuring data integrity and preventing downstream processing issues. Completeness checks identify missing values, including NULL values and custom-defined missing value patterns.

Follow these best practices for completeness checks:

  • Prioritize critical business columns such as financial values and timestamps.

  • Consider appropriate thresholds based on business requirements (not all columns need to be 100% complete).

  • Use count-based checks for smaller tables and percentage-based checks for larger tables.

  • Combine with freshness checks for time-based columns to ensure both presence and recency.

  • Define custom missing value patterns to catch business-specific incomplete data.

Available Metrics

Metric

Configuration Keys

Description

Supported Data Types

missing_count

missing values

The number of rows in a column that contain NULL values and any other user-defined values that qualify as missing

number, text, time

missing regex

Text-specific missing count with regex support for pattern-based missing value detection

text

missing_percent

missing values

The percentage of rows in a column, relative to the total row count, that contain NULL values and any other user-defined values that qualify as missing

number, text, time

missing regex

Text-specific missing percentage with regex support for pattern-based missing value detection

text

Configuration Keys

Configuration Key

Description

Expected Value Type

Data Types

missing regex

Specifies a regular expression to define your own custom missing values

regex, no forward slash delimiters

text only

missing values

Specifies the values that should be considered missing in addition to NULL

values in a list

number, text, time

Configuration Examples

Check Description

Configuration Steps

Example Check

Ensure all products have prices

  1. Select price column.

  2. Choose Missing Count check.

  3. Select = operator.

  4. Set threshold to 0.

missing_count(price) = 0

Validate low percentage of missing email addresses

  1. Select email column.

  2. Choose Missing Percentage check.

  3. Select <= operator.

  4. Set threshold to 5.

missing_percent(email) <= 5

Check for custom missing values

  1. Select status column.

  2. Choose Missing Count check.

  3. Add configuration: missing values: N/A, none, TBD

  4. Set threshold to 0.

missing_count(status) = 0 with custom missing values

Common Completeness Check Scenarios

Business Requirement

Configuration Approach

Example

Use Case

Critical Data Fields

Zero tolerance for missing values

missing_count(customer_id) = 0

Primary keys, required identifiers

Contact Information

Allow small percentage of missing data

missing_percent(email) <= 5%

Optional contact fields

Financial Data

Zero tolerance with custom missing codes

missing_count(revenue) = 0: missing values: [-1, 999999]

Revenue, pricing, financial metrics

Status Fields

Custom missing value detection

missing_count(order_status) = 0: missing values: [TBD, PENDING_REVIEW]

Workflow status tracking

Descriptive Fields

Pattern-based missing detection

missing_percent(description) <= 10%: missing regex: (?i)(tbd coming soon)

Date Fields

Zero tolerance for critical dates

missing_count(created_date) = 0

Audit trails, timestamps

Default Missing Value Detection

By default, Alation Data Quality considers the following as missing values:

  • NULL values: Standard database NULL values.

  • Empty strings: Zero-length text values.

  • Whitespace-only strings: Strings containing only spaces, tabs, or newlines.

Custom Missing Value Patterns

Value-Based Missing Detection

Define specific values that should be treated as missing.

-- Treat multiple variations of "not available" as missing
missing_count(customer_type) = 0:
missing values: [N/A, NA, n/a, none, NONE, TBD, unknown]

-- Handle numeric codes that represent missing data
missing_count(age) = 0:
missing values: [-1, 999, 0]

Pattern-Based Missing Detection

Use regular expressions to identify missing value patterns.

-- Identify various "not applicable" patterns
missing_count(description) = 0:
missing regex: (?i)(n/?a|not available|tbd|to be determined)

-- Detect placeholder patterns
missing_count(phone_number) = 0:
missing regex: (000-000-0000|999-999-9999|xxx-xxx-xxxx)

Multi-Pattern Missing Detection

Combine multiple patterns to catch various missing data representations.

-- Comprehensive customer data validation
missing_count(customer_name) = 0:
missing values: [N/A, NA, none, NONE, TBD, unknown, UNKNOWN]
missing regex: (?i)(not provided|to be updated|pending)

Business Rule-Based Completeness

Apply completeness checks based on business logic.

-- Shipping address required for physical products only
missing_count(shipping_address) = 0:
filter: product_type = 'PHYSICAL'

-- Contact info required for premium customers
missing_count(phone_number) = 0:
filter: customer_tier = 'PREMIUM'

Conditional Completeness Validation

Ensure completeness based on other field values.

-- Payment method required for paid orders
missing_count(payment_method) = 0:
filter: order_total > 0

-- Manager approval required for large discounts
missing_count(approval_id) = 0:
filter: discount_percent > 20