Configure Metadata Extraction

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Enhanced Connector Enhanced connectors add extended capabilities and require a separate entitlement in addition to your Alation platform license.

To configure metadata extraction (MDE) for the MaxCompute data source in Alation, follow these steps:

Test Access and Fetch Projects

Before extracting metadata, test the connection to MaxCompute and fetch the available projects:

  1. On the MaxCompute data source Settings page, go to the Metadata Extraction tab.

  2. Under Step 1: Test Access and Fetch Schemas, click Run.

    ../../../_images/MaxComputeTestAccess.png
  3. Alation tests the connection to MaxCompute and retrieves the list of available projects (schemas). If the connection succeeds, the status shows Success. If errors occur, review the error messages and verify your connection settings in General Settings.

Select Projects for Extraction

After successfully fetching projects, configure which projects to extract:

  1. Under Step 2: Select schemas for extraction, you will see the list of available MaxCompute projects.

  2. Select one or more projects by clicking the checkbox next to each project name.

    ../../../_images/MaxComputeSelectProjects.png

    Note

    If you do not select projects, Alation extracts metadata from all accessible projects.

  3. Use the Search by name field to filter projects if you have a large number of projects.

  4. (Optional) Turn on the Enable advanced settings toggle to access additional filter options:

    Filter Option

    Description

    Extract all except selected

    Extract metadata from all projects except those selected.

    Extract only selected

    Extract metadata from only the selected projects.

  5. (Optional) Check Keep the catalog synchronized with the current selection of schemas to remove previously extracted projects that are no longer selected.

Run Extraction

After configuring project selection, run the metadata extraction:

  1. Under Step 3: Run extraction, click Run extraction to extract metadata from MaxCompute.

    ../../../_images/MaxComputeRunExtraction.png
  2. To enable automated metadata updates, under Automated and Manual Extraction, turn on Enable Automated Extraction and select the desired schedule (day and time). Metadata extraction will run automatically at the scheduled time.

  3. Click Go to MDE Job History to view the status of the extraction job.

The extraction job will run and extract metadata based on your project selection. The status is logged in the MDE Job History tab.

Extracted Metadata

The connector extracts the following metadata objects:

MaxCompute Object

Alation Object

Project

Schema

Internal Table

Table

External Table

Table

View

View (includes view SQL/DDL)

Materialized View

View (without view SQL)

Column (including partition columns)

Column

User-Defined Function (UDF)

Function

Lineage Extraction

Lineage is extracted automatically during metadata extraction using the DataWorks Public API:

  • Table-Level Lineage: Extracted by default for all tables

  • Column-Level Lineage: Extracted if Enable Column Lineage is active in General Settings

Note

Column lineage extraction increases extraction time. Disable it for faster extraction if column-level lineage is not required.

Lineage relationships are displayed in the Alation lineage graph, including:

  • Source table to target table relationships

  • Column-to-column data flow

  • DataWorks ETL task information (displayed as Dataflow objects)

Script Details:

The connector also extracts script-related metadata from DataWorks ETL tasks:

Field

Description

Task

The name of the DataWorks task.

Status

Task execution status (for example, Success, Failure).

Trigger

How the task was triggered (for example, Scheduler, Manual).

Owner

The owner of the task.

TaskId

Unique identifier for the task.

StartedTime

When the task started executing.

FinishedTime

When the task finished executing.

Script Name

The name of the script.

Script Path

The path to the script in the DataWorks workspace.

Schedule

Cron expression with timezone (for example, 00 27 00 * * ? (America/New_York)).

Recurrence

Recurrence type (for example, Normal).

SQL

The actual MaxCompute SQL content of the script.

These details are extracted using the DataWorks GetTaskInstance and ListNodes APIs and are displayed as part of the Dataflow objects in the lineage graph.

../../../_images/MaxComputeLineageScript.png

The script details include task name, status, trigger type, start/end times, schedule (cron expression), and the actual SQL content.

Performance Optimization

For large environments with many projects and tables:

  1. Enable Parallel Processing: Turn on parallel API calls in General Settings for faster extraction.

  2. Use Table Filters: Configure table name filters in General Settings to extract only relevant tables.

  3. Keep Rate Limiting Enabled: If you experience API throttling errors, ensure rate limiting remains enabled in General Settings to cap API calls at 40 QPS.

  4. Deactivate Enable Column Lineage in General Settings for faster extraction if column-level lineage is not required.