Completeness Checks¶
Completeness checks verify that required data is present and not missing (NULL). These are critical for ensuring data integrity and preventing downstream processing issues. Completeness checks identify missing values, including NULL values and custom-defined missing value patterns.
Follow these best practices for completeness checks:
Prioritize critical business columns such as financial values and timestamps.
Consider appropriate thresholds based on business requirements (not all columns need to be 100% complete).
Use count-based checks for smaller tables and percentage-based checks for larger tables.
Combine with freshness checks for time-based columns to ensure both presence and recency.
Define custom missing value patterns to catch business-specific incomplete data.
Available Metrics¶
Metric |
Configuration Keys |
Description |
Supported Data Types |
---|---|---|---|
missing_count |
missing values |
The number of rows in a column that contain NULL values and any other user-defined values that qualify as missing |
number, text, time |
missing regex |
Text-specific missing count with regex support for pattern-based missing value detection |
text |
|
missing_percent |
missing values |
The percentage of rows in a column, relative to the total row count, that contain NULL values and any other user-defined values that qualify as missing |
number, text, time |
missing regex |
Text-specific missing percentage with regex support for pattern-based missing value detection |
text |
Configuration Keys¶
Configuration Key |
Description |
Expected Value Type |
Data Types |
---|---|---|---|
missing regex |
Specifies a regular expression to define your own custom missing values |
regex, no forward slash delimiters |
text only |
missing values |
Specifies the values that should be considered missing in addition to NULL |
values in a list |
number, text, time |
Configuration Examples¶
Check Description |
Configuration Steps |
Example Check |
---|---|---|
Ensure all products have prices |
|
|
Validate low percentage of missing email addresses |
|
|
Check for custom missing values |
|
|
Common Completeness Check Scenarios¶
Business Requirement |
Configuration Approach |
Example |
Use Case |
---|---|---|---|
Critical Data Fields |
Zero tolerance for missing values |
|
Primary keys, required identifiers |
Contact Information |
Allow small percentage of missing data |
|
Optional contact fields |
Financial Data |
Zero tolerance with custom missing codes |
|
Revenue, pricing, financial metrics |
Status Fields |
Custom missing value detection |
|
Workflow status tracking |
Descriptive Fields |
Pattern-based missing detection |
|
|
Date Fields |
Zero tolerance for critical dates |
|
Audit trails, timestamps |
Default Missing Value Detection¶
By default, Alation Data Quality considers the following as missing values:
NULL values: Standard database NULL values.
Empty strings: Zero-length text values.
Whitespace-only strings: Strings containing only spaces, tabs, or newlines.
Custom Missing Value Patterns¶
Value-Based Missing Detection¶
Define specific values that should be treated as missing.
-- Treat multiple variations of "not available" as missing
missing_count(customer_type) = 0:
missing values: [N/A, NA, n/a, none, NONE, TBD, unknown]
-- Handle numeric codes that represent missing data
missing_count(age) = 0:
missing values: [-1, 999, 0]
Pattern-Based Missing Detection¶
Use regular expressions to identify missing value patterns.
-- Identify various "not applicable" patterns
missing_count(description) = 0:
missing regex: (?i)(n/?a|not available|tbd|to be determined)
-- Detect placeholder patterns
missing_count(phone_number) = 0:
missing regex: (000-000-0000|999-999-9999|xxx-xxx-xxxx)
Multi-Pattern Missing Detection¶
Combine multiple patterns to catch various missing data representations.
-- Comprehensive customer data validation
missing_count(customer_name) = 0:
missing values: [N/A, NA, none, NONE, TBD, unknown, UNKNOWN]
missing regex: (?i)(not provided|to be updated|pending)
Business Rule-Based Completeness¶
Apply completeness checks based on business logic.
-- Shipping address required for physical products only
missing_count(shipping_address) = 0:
filter: product_type = 'PHYSICAL'
-- Contact info required for premium customers
missing_count(phone_number) = 0:
filter: customer_tier = 'PREMIUM'
Conditional Completeness Validation¶
Ensure completeness based on other field values.
-- Payment method required for paid orders
missing_count(payment_method) = 0:
filter: order_total > 0
-- Manager approval required for large discounts
missing_count(approval_id) = 0:
filter: discount_percent > 20