Bulk-Upload Lineage (Beta)

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Available from version 2024.3.1

Important

You are viewing documentation for Classic Alation.

Users with the Server Admin, Catalog Admin, and Source Admin roles have the ability to upload lineage information in bulk via the user interface. You upload lineage information using an Excel template (an .xlsx file) that you download from Alation.

You can create both single-source and cross-source lineage through bulk uploading, as well as both table- and column-level lineage. You can bulk-upload lineage for both connected and virtual sources.

Note

Server, Catalog, and Source Admins can also create lineage path by path in a visual editor, using the Manual Lineage Curation capabilities.

The ability to bulk-upload lineage is available by default on Alation Cloud Service instances.

On customer-managed instances, your instance administrator can enable this feature using alation_conf.

Note

On customer-managed instances, set the parameter alation.feature_flags.dev_enable_lineage_bulk_upload to True to enable this beta feature.

alation_conf alation.feature_flags.dev_enable_lineage_bulk_upload -s True

No restart of Alation components is required.

For help with using alation_conf, see Using alation_conf.

This will reveal the Bulk Upload user interface in Admin Settings > Lineage Settings.

To bulk-upload lineage follow these steps:

Prerequisite

Ensure that the data assets that you want to upload lineage for have been cataloged in Alation. Lineage can be uploaded for the following objects:

  • RDBMS objects

    • table (table)

    • column (attribute)

  • BI objects

    • BI report (bi_report)

    • BI report column (bi_report_column)

    • BI data source (bi_datasource)

    • BI data source column (bi_datasource_column)

  • File system objects

    • directory (directory)

    • file (file)

Step 1: Download the Bulk Upload Template

To download the template:

  1. Click the gear icon in the top right corner of the page. The Admin Settings page will open.

    ../../_images/TopNavigationBar_Gear_Neo.png
  2. In the search bar on top of the Admin Settings page, search for Lineage Settings. Alternatively, you can locate the Lineage Settings tile under Platform Settings. Click Lineage Settings to open that page.

  1. Click the three gears icon in the top right corner of the page.

  2. Depending on your role, open the lineage settings page:

    • Server Admins: Clicking on the three gears icon opens the Admin Settings page. In the Catalog Admin section, click Customize Catalog, and then click the Lineage tab to open lineage setting.

    • Catalog Admins: Clicking on the three gears icon opens the Admin Settings menu for Catalog Admins. Click the Customize Catalog item in the menu. You’ll be navigated to the Customize Catalog page. Click the Lineage tab to open lineage settings.

  1. Click on the Bulk Upload tab.

    ../../_images/Lineage_BulkUpload_ConfigTab.png
  2. Click Download Template. This will download the bulk_upload_template.xlsx file to your computer.

Step 2: Populate the Template with Your Lineage Data

Understand the Template Structure

The bulk_upload_template.xlsx file includes two sheets: Nodes and Paths.

Important

Don’t modify the file structure:

  • Don’t modify the sheet or column names.

  • Don’t remove or add any sheets.

  • Don’t alter the column structure by adding or removing columns.

Nodes

On this sheet, you define the source and target objects for which you’re establishing lineage. Each row defines one source or target object.

The Nodes sheet includes four columns:

Column

Is Required?

Description

Value

Example

Row ID

Yes

The row number in this sheet

An integer that is unique in this sheet

1

Object Type

Yes

Alation object type

One of: table, attribute, bi_report, bi_datasource, bi_report_column, bi_datasource_column, directory, file

table

Datasource ID

Yes

The ID of the data source, BI server, or file system that is parent to this source or target object

Integer

345

Fully Qualified Name

Yes

The fully qualified name of the source or target object

See Fully Qualified Names below

Fully Qualified Names

Object Type

Format

Example

Table

schema.table

public.customer

Column (attribute)

schema.table.column

public.customer.name

File system directory

Path of the directory under the file system delimited by /

/alation-demo-systems/alation_edw/finance

File system file

Path of the file under the file system delimited by /

/alation-demo-systems/alation_edw/finance/finance__bnk_acct_detail.csv

BI report

bi_report.external_id

bi_report.bfab072a-f74c-4cfb-9e09-dbfec1850678/58059709-ba7f-4438-8110-a3d7108607a5

BI report column

bi_report_column.external_id

bi_report_column.bfab072a-f74c-4cfb-9e09-dbfec1850678/cb3aba14-9196-4c89-8568-171ee328999d/[none:CNTY_NM:nk]

BI data source

bi_datasource.external_id

bi_datasource.bfab072a-f74c-4cfb-9e09-dbfec1850678/ba85e58c-99c4-43b2-9bd8-0b6710c0940d

BI data source column

bi_datasource_column.external_id

bi_datasource_column.bfab072a-f74c-4cfb-9e09-dbfec1850678/e2d66818-e112-441b-ae05-3b09c0a19216/[CNTY_NM]

Paths

On this sheet, you define lineage paths by matching source objects with the corresponding target objects.

Column Name

Required?

Description

Value

Example

Row ID

Yes

Row number in this sheet

An integer that is unique in this sheet

1

Source Row ID

Yes

The Row ID of the source object as it appears in the Nodes sheet

Integer

3

External ID

Yes

External ID of the dataflow object that will get created for this lineage path, in the same format as expected by the Lineage V2 API (api/<id>).

Text

api/34569244

Content

Yes

Dataflow content, for example a SQL query.

Text

CREATE TABLE DM.PRODUCT AS SELECT * FROM STAGING.tmp_product;

Target Row ID

Yes

The Row ID of the target object as it appears in the Nodes sheet

Integer

4

Constraints

The Nodes and Paths sheets allow a maximum of 200 rows each. If this number is exceeded, the upload in the user interface will fail.

Note

An instance admin can adjust this number on the Alation server using the alation_conf parameter lineage-service.max_rows_bulk_manual_lineage_upload_excel. No restart of Alation components is required.

For help with using alation_conf, see Using alation_conf.

Note

Alation Cloud Service customers can request server configuration changes through Alation Support.

Fill the Template

To fill the template:

  1. Prepare your object information. You’ll need a list of all objects that you want to connect with lineage paths, including their parent source IDs, object types, and fully qualified names. There are several ways to gather this information from the catalog:

    • Parent source ID

      • You can get the parent source ID from the URL of the corresponding catalog page:

        • Data source

          • https://mycatalog.alationcloud.com/app/data/5/overview: The number that comes after /data/ is the data source ID. In this example, it’s 5.

        • BI source

          • https://mycatalog.alationcloud.com/app/bi_server/1/overview: The number that comes after /bi_server/ is the BI source ID. In this example, it’s 1.

        • File system source

          • https://mycatalog.alationcloud.com/app/filesystem/6/overview: The number that comes after /filesystem/ is the file system source ID. In this example, it’s 6.

    • Object type

      • Match the objects you’re uploading lineage for with one of the following types: table, attribute, bi_report, bi_datasource, bi_report_column, bi_datasource_column, directory, file.

    • Fully-qualified names

      • Tables and columns

      • BI objects

        • You can get the information needed to build fully qualified names from the Business Intelligence APIs (GBM V2). This API returns values for the external_id property that you need to provide as part of the fully qualified name. Refer to the table Fully Qualified Names for examples.

      • File system objects

        • An API isn’t available. You’ll need to collect the necessary information from the Alation user interface. For example, the path to file system objects can be obtained from the Within column of the tabular view of the Alation Search results.

  2. Decide how you want to portion your lineage upload, depending on how much lineage data you’re uploading. You may want to divide those objects into batches and create a separate upload file for each batch.

  3. Open the bulk_upload_template.xlsx file. There is no prescribed scenario for how to fill this template in the beta version of this feature. You can do it in any way you consider convenient. The steps below offer an example of how to fill it manually.

  4. Open the Nodes sheet.

  5. Fill in the rows for the first lineage path by listing all the source and target objects. For example:

    Row ID

    Object Type

    Datasource ID

    Fully Qualified Name

    1

    table

    5433

    staging.prospects

    2

    table

    32

    customer.prospects

  6. Switch to the Paths sheet.

  7. Fill in the lineage information for the objects you listed on the Nodes sheet organizing them into a lineage path: designate one object as the source lineage object and the other as the target lineage object. For example:

    Row ID

    Source Row ID

    External ID

    Content

    Target Row ID

    1

    1

    api/5433

    SQL query

    2

  8. Remember to save the changes to the file as you’re filling it. When you have matched all the objects in the Nodes sheet through paths in the Paths sheet, you are ready to upload lineage to Alation.

Step 3: Upload the Template to Alation

To upload your populated template:

  1. Open the Admin Settings > Lineage Settings page.

  2. Click the Bulk Upload tab.

  3. On the right of the Upload Process History section, click Upload.

  4. Select your file and upload it. If there are no validation errors, your file will be uploaded and you will see a success message. If the upload ends in an error, see Troubleshoot Lineage Bulk-Upload for troubleshooting information.

View Upload Job Details

Every time you upload the template, you start an upload job. The technical details for the job are logged in the Job History table at the bottom of the Bulk Upload page.

To view your upload job details:

  1. In the Upload Process History table at the bottom of the page, click View Details for the job you’re interested to view.

    ../../_images/Lineage_BulkUpload_JobHistory.png
  2. The Job details pop-up box opens with the summary of the job status.

    ../../_images/Lineage_BulkUpload_JobDetails.png
  3. Click the Export logs link at the bottom of the popup to download a more detailed job log in the CSV format. This can be useful if the upload results in errors.

Troubleshoot Lineage Bulk-Upload

You may encounter these errors during the upload:

Upload Validations

  • Only xlsx file allowed

    • Ensure you’re uploading a file with the .xlsx extension.

  • Invalid Excel Template. Sheet names are not correct.

  • Invalid Excel Template. Column headers of nodes sheet are not correct.

    • Ensure the column names are correct. Don’t change any of the column names. They must be the same as in the template you downloaded from Alation. See Understand the Template Structure. Make sure that no additional columns have been added and none have been removed from the template.

  • Invalid Excel File. Max rows limit 200 exceeded

    • You can’t include more than 200 Row IDs per sheet (default). See Constraints.

Job Validations

  • Row ID ‘abc’ should be a number

    • Ensure the values in the Row ID columns on both the sheets are unique integers.

  • Path Row ID ‘abc’ should be unique

    • Ensure the values in the Row ID columns on both the sheets are unique integers.

  • External ID ‘abc’ is invalid

    • The fully qualified name is incorrect. Check the unique identifier included into the fully qualified name.

  • Source Row ID ‘abc’ is invalid

    • Ensure the values in the Row ID columns on both the sheets are unique integers.

  • Source Node ID <N> doesn’t exist in Nodes sheet

    • Ensure that this Source Row ID exists on the Nodes sheet.

  • Source Row ID <N> has error in Nodes Sheet

    • Ensure that the Row ID is a unique integer.

  • Target Row ID ‘abc’ is invalid

    • Ensure the values in the Row ID columns on both the sheets are unique integers.

  • Target Node ID <N> doesn’t exist in Nodes Sheet

    • Ensure that the Target Row ID exists on the Nodes sheet.

  • Target Row ID <N> has error in Nodes Sheet

    • Ensure that the Row ID is a unique integer.