Explore Lineage¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Lineage is data about the origin of data and its movement through an organization’s data ecosystem. Lineage documents how target data objects are created from source data objects. Lineage is visually represented as a chart on the Lineage tab of a data source, BI source, or file system. Lineage charts frequently include dataflow objects, which can be used to document:
ETL and ELT processes
Stored procedures
SQL queries
Scripts that transform source data into target data
The lineage chart brings together a target data object, its upstream sources, and the dataflow objects that track its movement, to fully represent the data ecosystem.
From version 2023.3, lineage can be displayed in either of two views: a classic view or a compound layout view. For more information, see Analyze the Lineage Chart.
Lineage Architecture¶
The lineage framework in Alation is built on Lineage V3, or the lineage service, introduced in version 2021.4. The lineage service is a microservice operating inside the Alation server that is responsible for the creation, storage, and retrieval of lineage data into the catalog.
The Alation server creates lineage data from multiple sources, such as metadata extraction (MDE), query log ingestion (QLI), Compose query history, and public APIs. Lineage events generated from these sources are sent to the lineage service via Event Bus. In the lineage service:
The lineage write service consumes lineage events from the Event Bus and stores this lineage data into the lineage database.
The lineage read service retrieves the stored lineage data and powers the lineage diagrams in the Alation user interface.
The image below illustrates the lineage architecture for a customer-managed Alation instance.
Types of Lineage¶
There are two main types of lineage: table-level and column-level. Table-level lineage is the more common, as all types of lineage extraction are capable of producing it. Column-level lineage is dependent upon both the data source and the data source connector. Column-level lineage is calculated for those sources whose connectors support it. For a complete list of data sources that support column-level lineage, see the Support Matrix for your Alation version.
Both table-level and column-level lineage can be created:
Automatic Lineage¶
Alation automatically calculates lineage using metadata sourced from metadata extraction (MDE), query log ingestion (QLI), and Compose queries. For most data sources, automatic lineage calculation requires query history data extracted and ingested with QLI. Lineage from Compose only exposes data transformations done through Alation’s Compose. Some data sources, for example, SAP HANA and Databricks Unity Catalog, support direct lineage extraction, which is lineage data extracted from system tables during MDE.
Manual Lineage¶
Users can create and edit lineage charts manually in the Alation interface using the capabilities of the Manual Lineage feature. Learn more in Create Lineage Data Manually.
Creating Lineage via the API¶
Alation provides a public API to create and update lineage data in the data catalog. The Lineage API documentation can be found on the Developer Portal: Lineage APIs.
For a quick start guide to lineage APIs, see Lineage - General API Quick Start Guide
For a quick start guide to dataflow object APIs, see Lineage - Dataflow Quick Start
For frequently asked questions about lineage and dataflow objects, see Lineage & Dataflow - Frequently Asked Questions
Enabling Column-Level Lineage¶
For most connectors that support column-level lineage, column-level lineage is not calculated by default. You must first enable automatic extraction by setting a feature flag similar to the following on the Feature Configuration tab of Alation’s Admin Settings page:
If you still do not see column-level lineage, check with your Alation account manager to ensure that column-level lineage for the specified connector is part of your Alation license entitlement.
Column-Level Lineage from Custom SQL¶
Applies from version 2024.1.2
In BI systems like Tableau or Power BI, analysts often create data sources and datasets using SQL queries that transform source data for specific analyses. Starting with version 2024.1.2, Alation automatically captures such SQL queries and generates column-level lineage detailing data transformations between source and target systems. The lineage supports all types of SQL operations that can be used to create BI data sources, for example SELECT *
, CREATE AS SELECT
, joins, and unions.
Some conditions must be met for users to see lineage for SQL query-based BI data sources:
Upstream data sources must be cataloged in Alation using the appropriate OCF connector.
Upstream data sources and downstream BI sources must support column-level lineage, which needs to be enabled in the catalog.
Cross-source lineage must be configured either on the BI source or the data source.
Note
Cross-source lineage is a configuration that establishes a mapping between sources in the catalog. It enables Alation to identify (resolve) lineage objects more accurately and generate upstream lineage that traces data flows from one source to another. Without proper cross-source lineage configuration, upstream objects might not be identified correctly and could appear as temporary (TMP) nodes on lineage charts. OCF connector documentation for the connector you’re using will contain information on how to configure cross-source lineage if it’s supported by the connector.
Catalog users can find SQL queries used to create the BI data source on the Connections tab of BI datasource or BI dataset objects’ catalog pages.
Note
Lineage from SQL query-based BI data sources is generated automatically and does not require additional enablement on an instance. However, users with the Server Admin role may need to be aware of two alation_conf feature flags that control this feature on an instance. Both are set to True
by default (enabled):
alation.resolution.DEV_no_hostport_lineage_resolution
–Enables cross-source lineage when the target system doesn’t have the host and port information of the source system.
alation.resolution.DEV_sql_cll
–Enables column-level lineage for the BI data source type of SQL query between BI and RDBMS systems.
Known Issues with Custom SQL Lineage¶
When a column name is adjusted on the BI server by the user or due to auto-formatting, Alation can’t trace column-level lineage for the affected column.
When the custom SQL for a BI data source is modified on the BI server and columns are removed, this change will not be reflected in the lineage charts in Alation after a subsequent extraction. Columns remain in the lineage charts as previously extracted.
Column-Level Lineage for Temporary Objects¶
Applies from version 2024.1.2, Alation Cloud Service instances on cloud-native architecture
Permanent data objects are often created through transformations that use temporary tables. For example, temporary tables are commonly created in data transformation pipelines, such as those in dbt. Alation identifies a table in lineage as temporary if it meets any of the following criteria:
It is created using either the
TRANSIENT
orTEMPORARY
keywords:CREATE TEMPORARY TABLE
CREATE OR REPLACE TEMPORARY TABLE
CREATE TRANSIENT TABLE
CREATE OR REPLACE TRANSIENT TABLE
It is created and dropped within the same session.
It appears in ingested queries but isn’t cataloged within Alation.
On lineage charts, temporary tables are marked with a TMP badge.
By default, Alation’s lineage parser does not detect column information for temporary tables, displaying lineage only at the table level on lineage charts. This applies when a permanent cataloged object (table or view) includes temporary tables in its lineage.
Starting with version 2024.1.2, column-level lineage for temporary tables can be enabled on Alation Cloud Service instances on cloud-native architecture.
Note
On customer-managed (on-premise) instances, lineage for temporary tables is limited to the table level. Column-level details for temporary tables are not available. No additional configuration is required.
When enabled, users can trace lineage from temporary columns to columns in a permanent table or view cataloged in Alation. The columns of temporary tables or views are also labeled as temporary. The temporary columns will be visible on lineage charts only if they were explicitly specified in queries. The relevant queries will be captured as part of the dataflow content.
Note
For SELECT *
queries, column-level lineage is visible only from CREATE TABLE AS SELECT (DISTINCT) * FROM
type of queries. Column-level lineage will not be shown for SELECT *
or INSERT INTO
type of queries, such as:
SELECT * INTO <table A> FROM <table B>
INSERT INTO <table A> (<col1>, <col2>) SELECT * FROM <table B>
INSERT INTO <table A> SELECT * FROM <table B> WHERE <filter>
Enable Column-Level Lineage for Temporary Objects¶
To activate column-level lineage for temporary tables on Alation Cloud Service instances, submit a request to Alation Support. They will enable the dedicated configuration flags on your Alation instance. These flags additionally enable lineage tracing from ALTER TABLE
queries. For more details, see Lineage from ALTER TABLE Queries.
Lineage from ALTER TABLE Queries¶
Applies from version 2024.1.2
Starting with version 2024.1.2, you can see lineage from ALTER TABLE RENAME
queries on lineage charts, provided the ALTER
query is executed in the same session as the corresponding CREATE TABLE
query.
On customer-managed (on-premise) instances and Alation Cloud Service instances that haven’t been migrated to the cloud-native architecture, lineage from
ALTER TABLE
queries is enabled by default. Lineage visualization is available at the table level only.On Alation Cloud Service instances on cloud-native architecture, lineage from
ALTER TABLE
queries must be additionally enabled. Lineage visualization is available at the column level. You can request these configuration changes through Alation Support.Note
Enabling lineage from the
ALTER TABLE
queries also enables Column-Level Lineage for Temporary Objects.