Configure Metadata Extraction¶

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Enhanced Connector Enhanced connectors add extended capabilities and require a separate entitlement in addition to your Alation platform license.

To configure metadata extraction (MDE) for the MaxCompute data source in Alation, follow these steps:

Test Access and Fetch Projects
Select Projects for Extraction
Run Extraction

Test Access and Fetch Projects¶

Before extracting metadata, test the connection to MaxCompute and fetch the available projects:

On the MaxCompute data source Settings page, go to the Metadata Extraction tab.
Under Step 1: Test Access and Fetch Schemas, click Run.
Alation tests the connection to MaxCompute and retrieves the list of available projects (schemas). If the connection succeeds, the status shows Success. If errors occur, review the error messages and verify your connection settings in General Settings.

Select Projects for Extraction¶

After successfully fetching projects, configure which projects to extract:

Under Step 2: Select schemas for extraction, you will see the list of available MaxCompute projects.
Select one or more projects by clicking the checkbox next to each project name.

Note

If you do not select projects, Alation extracts metadata from all accessible projects.
Use the Search by name field to filter projects if you have a large number of projects.
(Optional) Turn on the Enable advanced settings toggle to access additional filter options:

Filter Option

Description

Extract all except selected

Extract metadata from all projects except those selected.

Extract only selected

Extract metadata from only the selected projects.
(Optional) Check Keep the catalog synchronized with the current selection of schemas to remove previously extracted projects that are no longer selected.

Run Extraction¶

After configuring project selection, run the metadata extraction:

Under Step 3: Run extraction, click Run extraction to extract metadata from MaxCompute.
To enable automated metadata updates, under Automated and Manual Extraction, turn on Enable Automated Extraction and select the desired schedule (day and time). Metadata extraction will run automatically at the scheduled time.
Click Go to MDE Job History to view the status of the extraction job.

The extraction job will run and extract metadata based on your project selection. The status is logged in the MDE Job History tab.

Extracted Metadata¶

The connector extracts the following metadata objects:

MaxCompute Object	Alation Object
Project	Schema
Internal Table	Table
External Table	Table
View	View (includes view SQL/DDL)
Materialized View	View (without view SQL)
Column (including partition columns)	Column
User-Defined Function (UDF)	Function

Lineage Extraction¶

Lineage is extracted automatically during metadata extraction using the DataWorks Public API:

Table-Level Lineage: Extracted by default for all tables
Column-Level Lineage: Extracted if Enable Column Lineage is active in General Settings

Note

Column lineage extraction increases extraction time. Disable it for faster extraction if column-level lineage is not required.

Lineage relationships are displayed in the Alation lineage graph, including:

Source table to target table relationships
Column-to-column data flow
DataWorks ETL task information (displayed as Dataflow objects)

Script Details:

The connector also extracts script-related metadata from DataWorks ETL tasks:

Field	Description
Task	The name of the DataWorks task.
Status	Task execution status (for example, Success, Failure).
Trigger	How the task was triggered (for example, Scheduler, Manual).
Owner	The owner of the task.
TaskId	Unique identifier for the task.
StartedTime	When the task started executing.
FinishedTime	When the task finished executing.
Script Name	The name of the script.
Script Path	The path to the script in the DataWorks workspace.
Schedule	Cron expression with timezone (for example, `00 27 00 * * ? (America/New_York)`).
Recurrence	Recurrence type (for example, Normal).
SQL	The actual MaxCompute SQL content of the script.

These details are extracted using the DataWorks GetTaskInstance and ListNodes APIs and are displayed as part of the Dataflow objects in the lineage graph.

../../../_images/MaxComputeLineageScript.png

The script details include task name, status, trigger type, start/end times, schedule (cron expression), and the actual SQL content.

Performance Optimization¶

For large environments with many projects and tables:

Enable Parallel Processing: Turn on parallel API calls in General Settings for faster extraction.
Use Table Filters: Configure table name filters in General Settings to extract only relevant tables.
Keep Rate Limiting Enabled: If you experience API throttling errors, ensure rate limiting remains enabled in General Settings to cap API calls at 40 QPS.
Deactivate Enable Column Lineage in General Settings for faster extraction if column-level lineage is not required.

Filter Option	Description
Extract all except selected	Extract metadata from all projects except those selected.
Extract only selected	Extract metadata from only the selected projects.