Best Practices For Curation Automation

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Curation Automation is Alation’s AI-powered framework for scaling data governance across your catalog. By replacing manual updates with a standardized, rule-based system, it ensures that metadata remains consistent and reliable as your data environment grows. Here are some best practices for designing and maintaining effective curation rules.

Choose a High-Impact Use Case

Before creating a rule, choose a focused, outcome-driven use case.

Examples

Improve Discoverability

Fill missing descriptions across the Gold layer so business users can easily understand and trust certified datasets.

Strengthen Compliance

Classify PII across production schemas to ensure sensitive fields are consistently tagged. Avoid starting with broad experimentation. Choose one measurable outcome.

Write a Strong and Context-Rich Purpose

The Purpose field is not just documentation. It directly influences how the AI interprets your metadata.

A good purpose answers the following questions:

  • What you are trying to achieve?

  • Why you are doing it?

  • Who the audience is?

  • Your industry or domain?

Example

Here’s an example of a Purpose for a rule designed to fill in missing descriptions in the Clinical Gold layer:

We are a healthcare organization. This rule ensures that all tables and columns in the Clinical Gold layer have clear, business-friendly descriptions. These descriptions must avoid technical jargon and help clinicians and analysts understand the dataset’s purpose without referencing internal system codes.

For more information on what to focus when drafting the Purpose for your rule, see Purpose

Incremental Scope Expansion

Scope defines which assets the rule applies to. Therefore, consider adopting the following approach:

  1. Start with one schema, all tables, and all columns

  2. Run the rule.

  3. Review outcomes.

  4. Edit the rule to expand the scope, for example, include three or more schemas.

Increase the scope further over time. For more information on scope selection, see Scope Selection.

Bottom-Up Approach (Columns First)

When curating your rule, consider starting with columns, follow it up with table curation, and then expand to schemas. This approach is effective as columns are the most granular layer of data.

If column descriptions and classifications are accurate, you will see the following benefits:

  • Table descriptions improve automatically

  • PII classifications are more precise

  • Trust signals become stronger

Bottom-up curation produces consistent results.

Consider Explicit and Detailed Field Selection

When configuring fields, remember that each field allows up to 2,000 characters of AI instruction and hence use it appropriately.

Example AI Instructions

The following examples illustrate how to tailor instructions to specific field semantics to achieve higher-quality curation results.

Description

Generate a clear, accurate description of this data object based on its name, metadata, relationships, and available context. Explain its purpose, the type of information it represents, and how it fits within the broader dataset or domain.

Title

Create a clean, readable, business-friendly title for this object by interpreting its technical name, abbreviations, and conventions. Expand acronyms where appropriate and produce a meaningful title that helps users quickly understand what the object represents without altering the underlying intent.

Use the Preview to Validate and Tune

The Preview screen is a tuning environment. You use the preview to validate your assumptions, test different instructions, and ensure that the rule is producing the desired results before you run it. Use it to:

  • Search and preview multiple objects

  • Edit instructions

  • Regenerate values

  • Compare output changes in real-time

Try with multiple objects and regenerate until the output quality is acceptable and then proceed to run the rule.

Treat preview as an iteration space and not just to preview your rule configuration.

Understand AI Action Estimates

Before you initiate your rule, the system provides an estimated action count to predict how many metadata fields will be updated based on your current configuration and AI instructions. This estimate is based on a model that assumes all blank fields in your selection will be populated, though the actual actions consumed will always be no more than this estimate. The discrepancy between the estimate and the final count occurs because the AI follows strict quality gates to ensure data integrity:

  • Confidence Thresholds: Only high-confidence values (80% or more) are applied to your assets. Medium and low-confidence suggestions are discarded to prevent inaccuracies.

  • Success-Based Billing: AI actions are only deducted from your balance when a value is successfully applied to a field.

If the initial AI action estimate looks too high, consider doing the following:

  • Reduce scope

  • Remove asset groups that are not relevant to your use case

  • Limit to specific schemas

For more information, see AI Instructions.