Best Practices For Curation Automation¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Curation Automation is Alation’s AI-powered framework for scaling data governance across your catalog. By replacing manual updates with a standardized, rule-based system, it ensures that metadata remains consistent and reliable as your data environment grows. Here are some best practices for designing and maintaining effective curation rules.
Choose a High-Impact Use Case¶
Before creating a rule, choose a focused, outcome-driven use case.
Examples¶
Improve Discoverability¶
Fill missing descriptions across the Gold layer so business users can easily understand and trust certified datasets.
Strengthen Compliance¶
Classify PII across production schemas to ensure sensitive fields are consistently tagged. Avoid starting with broad experimentation. Choose one measurable outcome.
Write a Strong and Context-Rich Purpose¶
The Purpose field is not just documentation. It directly influences how the AI interprets your metadata.
A good purpose answers the following questions:
What you are trying to achieve?
Why you are doing it?
Who the audience is?
Your industry or domain?
Example¶
Here’s an example of a Purpose for a rule designed to fill in missing descriptions in the Clinical Gold layer:
We are a healthcare organization. This rule ensures that all tables and columns in the Clinical Gold layer have clear, business-friendly descriptions. These descriptions must avoid technical jargon and help clinicians and analysts understand the dataset’s purpose without referencing internal system codes.
For more information on what to focus when drafting the Purpose for your rule, see Purpose
Incremental Scope Expansion¶
Scope defines which assets the rule applies to. Therefore, consider adopting the following approach:
Start with one schema, all tables, and all columns
Run the rule.
Review outcomes.
Edit the rule to expand the scope, for example, include three or more schemas.
Increase the scope further over time. For more information on scope selection, see Scope Selection.
Bottom-Up Approach (Columns First)¶
When curating your rule, consider starting with columns, follow it up with table curation, and then expand to schemas. This approach is effective as columns are the most granular layer of data.
If column descriptions and classifications are accurate, you will see the following benefits:
Table descriptions improve automatically
PII classifications are more precise
Trust signals become stronger
Bottom-up curation produces consistent results.
Consider Explicit and Detailed Field Selection¶
When configuring fields, remember that each field allows up to 2,000 characters of AI instruction and hence use it appropriately.
Example AI Instructions¶
The following examples illustrate how to tailor instructions to specific field semantics to achieve higher-quality curation results.
Description¶
Generate a clear, accurate description of this data object based on its name, metadata, relationships, and available context. Explain its purpose, the type of information it represents, and how it fits within the broader dataset or domain.
Title¶
Create a clean, readable, business-friendly title for this object by interpreting its technical name, abbreviations, and conventions. Expand acronyms where appropriate and produce a meaningful title that helps users quickly understand what the object represents without altering the underlying intent.
Use the Preview to Validate and Tune¶
The Preview screen is a tuning environment. You use the preview to validate your assumptions, test different instructions, and ensure that the rule is producing the desired results before you run it. Use it to:
Search and preview multiple objects
Edit instructions
Regenerate values
Compare output changes in real-time
Try with multiple objects and regenerate until the output quality is acceptable and then proceed to run the rule.
Treat preview as an iteration space and not just to preview your rule configuration.
Understand AI Action Estimates¶
Before you initiate your rule, the system provides an estimated action count to predict how many metadata fields will be updated based on your current configuration and AI instructions. This estimate is based on a model that assumes all blank fields in your selection will be populated, though the actual actions consumed will always be no more than this estimate. The discrepancy between the estimate and the final count occurs because the AI follows strict quality gates to ensure data integrity:
Confidence Thresholds: Only high-confidence values (80% or more) are applied to your assets. Medium and low-confidence suggestions are discarded to prevent inaccuracies.
Success-Based Billing: AI actions are only deducted from your balance when a value is successfully applied to a field.
If the initial AI action estimate looks too high, consider doing the following:
Reduce scope
Remove asset groups that are not relevant to your use case
Limit to specific schemas
For more information, see AI Instructions.