Bulk-Upload Lineage (Beta)¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Available from version 2024.3.1
Important
You are viewing documentation for Classic Alation.
Users with the Server Admin, Catalog Admin, and Source Admin roles have the ability to upload lineage information in bulk via the user interface. You upload lineage information using an Excel template (an .xlsx file) that you download from Alation.
You can create both single-source and cross-source lineage through bulk uploading, as well as both table- and column-level lineage. You can bulk-upload lineage for both connected and virtual sources.
Note
Server, Catalog, and Source Admins can also create lineage path by path in a visual editor, using the Manual Lineage Curation capabilities.
The ability to bulk-upload lineage is available by default on Alation Cloud Service instances.
On customer-managed instances, your instance administrator can enable this feature using alation_conf.
Note
On customer-managed instances, set the parameter
alation.feature_flags.dev_enable_lineage_bulk_upload
toTrue
to enable this beta feature.alation_conf alation.feature_flags.dev_enable_lineage_bulk_upload -s TrueNo restart of Alation components is required.
For help with using alation_conf, see Using alation_conf.
This will reveal the Bulk Upload user interface in Admin Settings > Lineage Settings.
To bulk-upload lineage follow these steps:
Prerequisite¶
Ensure that the data assets that you want to upload lineage for have been cataloged in Alation. Lineage can be uploaded for the following objects:
RDBMS objects
table (
table
)column (
attribute
)
BI objects
BI report (
bi_report
)BI report column (
bi_report_column
)BI data source (
bi_datasource
)BI data source column (
bi_datasource_column
)
File system objects
directory (
directory
)file (
file
)
Step 1: Download the Bulk Upload Template¶
To download the template:
Click the gear icon in the top right corner of the page. The Admin Settings page will open.
In the search bar on top of the Admin Settings page, search for Lineage Settings. Alternatively, you can locate the Lineage Settings tile under Platform Settings. Click Lineage Settings to open that page.
Click the three gears icon in the top right corner of the page.
Depending on your role, open the lineage settings page:
Server Admins: Clicking on the three gears icon opens the Admin Settings page. In the Catalog Admin section, click Customize Catalog, and then click the Lineage tab to open lineage setting.
Catalog Admins: Clicking on the three gears icon opens the Admin Settings menu for Catalog Admins. Click the Customize Catalog item in the menu. You’ll be navigated to the Customize Catalog page. Click the Lineage tab to open lineage settings.
Click on the Bulk Upload tab.
Click Download Template. This will download the bulk_upload_template.xlsx file to your computer.
Step 2: Populate the Template with Your Lineage Data¶
Understand the Template Structure¶
The bulk_upload_template.xlsx file includes two sheets: Nodes and Paths.
Important
Don’t modify the file structure:
Don’t modify the sheet or column names.
Don’t remove or add any sheets.
Don’t alter the column structure by adding or removing columns.
Nodes¶
On this sheet, you define the source and target objects for which you’re establishing lineage. Each row defines one source or target object.
The Nodes sheet includes four columns:
Column |
Is Required? |
Description |
Value |
Example |
---|---|---|---|---|
Row ID |
Yes |
The row number in this sheet |
An integer that is unique in this sheet |
|
Object Type |
Yes |
Alation object type |
One of: |
|
Datasource ID |
Yes |
The ID of the data source, BI server, or file system that is parent to this source or target object |
Integer |
|
Fully Qualified Name |
Yes |
The fully qualified name of the source or target object |
See Fully Qualified Names below |
Fully Qualified Names¶
Object Type |
Format |
Example |
---|---|---|
Table |
|
|
Column ( |
|
|
File system directory |
Path of the directory under the file system delimited by |
|
File system file |
Path of the file under the file system delimited by |
|
BI report |
|
|
BI report column |
|
|
BI data source |
|
|
BI data source column |
|
|
Paths¶
On this sheet, you define lineage paths by matching source objects with the corresponding target objects.
Column Name |
Required? |
Description |
Value |
Example |
---|---|---|---|---|
Row ID |
Yes |
Row number in this sheet |
An integer that is unique in this sheet |
|
Source Row ID |
Yes |
The Row ID of the source object as it appears in the Nodes sheet |
Integer |
|
External ID |
Yes |
External ID of the dataflow object that will get created for this lineage path, in the same format as expected by the Lineage V2 API ( |
Text |
|
Content |
Yes |
Dataflow content, for example a SQL query. |
Text |
|
Target Row ID |
Yes |
The Row ID of the target object as it appears in the Nodes sheet |
Integer |
|
Constraints¶
The Nodes and Paths sheets allow a maximum of 200
rows each. If this number is exceeded, the upload in the user interface will fail.
Note
An instance admin can adjust this number on the Alation server using the alation_conf parameter lineage-service.max_rows_bulk_manual_lineage_upload_excel
. No restart of Alation components is required.
For help with using alation_conf, see Using alation_conf.
Note
Alation Cloud Service customers can request server configuration changes through Alation Support.
Fill the Template¶
To fill the template:
Prepare your object information. You’ll need a list of all objects that you want to connect with lineage paths, including their parent source IDs, object types, and fully qualified names. There are several ways to gather this information from the catalog:
Parent source ID
You can get the parent source ID from the URL of the corresponding catalog page:
Data source
https://mycatalog.alationcloud.com/app/data/5/overview
: The number that comes after/data/
is the data source ID. In this example, it’s5
.
BI source
https://mycatalog.alationcloud.com/app/bi_server/1/overview
: The number that comes after/bi_server/
is the BI source ID. In this example, it’s1
.
File system source
https://mycatalog.alationcloud.com/app/filesystem/6/overview
: The number that comes after/filesystem/
is the file system source ID. In this example, it’s6
.
Object type
Match the objects you’re uploading lineage for with one of the following types:
table
,attribute
,bi_report
,bi_datasource
,bi_report_column
,bi_datasource_column
,directory
,file
.
Fully-qualified names
Tables and columns
You can Download a Data Dictionary for the data source and get the fully qualified object names from the dictionary file.
You can get the fully qualified names by retrieving the list of objects using Alation’s public API: Relational Integration API. Refer to the table Fully Qualified Names for examples.
BI objects
You can get the information needed to build fully qualified names from the Business Intelligence APIs (GBM V2). This API returns values for the
external_id
property that you need to provide as part of the fully qualified name. Refer to the table Fully Qualified Names for examples.
File system objects
An API isn’t available. You’ll need to collect the necessary information from the Alation user interface. For example, the path to file system objects can be obtained from the Within column of the tabular view of the Alation Search results.
Decide how you want to portion your lineage upload, depending on how much lineage data you’re uploading. You may want to divide those objects into batches and create a separate upload file for each batch.
Open the bulk_upload_template.xlsx file. There is no prescribed scenario for how to fill this template in the beta version of this feature. You can do it in any way you consider convenient. The steps below offer an example of how to fill it manually.
Open the Nodes sheet.
Fill in the rows for the first lineage path by listing all the source and target objects. For example:
Row ID
Object Type
Datasource ID
Fully Qualified Name
1
table
5433
staging.prospects
2
table
32
customer.prospects
Switch to the Paths sheet.
Fill in the lineage information for the objects you listed on the Nodes sheet organizing them into a lineage path: designate one object as the source lineage object and the other as the target lineage object. For example:
Row ID
Source Row ID
External ID
Content
Target Row ID
1
1
api/5433
SQL query
2
Remember to save the changes to the file as you’re filling it. When you have matched all the objects in the Nodes sheet through paths in the Paths sheet, you are ready to upload lineage to Alation.
Step 3: Upload the Template to Alation¶
To upload your populated template:
Open the Admin Settings > Lineage Settings page.
Click the Bulk Upload tab.
On the right of the Upload Process History section, click Upload.
Select your file and upload it. If there are no validation errors, your file will be uploaded and you will see a success message. If the upload ends in an error, see Troubleshoot Lineage Bulk-Upload for troubleshooting information.
View Upload Job Details¶
Every time you upload the template, you start an upload job. The technical details for the job are logged in the Job History table at the bottom of the Bulk Upload page.
To view your upload job details:
In the Upload Process History table at the bottom of the page, click View Details for the job you’re interested to view.
The Job details pop-up box opens with the summary of the job status.
Click the Export logs link at the bottom of the popup to download a more detailed job log in the CSV format. This can be useful if the upload results in errors.
Troubleshoot Lineage Bulk-Upload¶
You may encounter these errors during the upload:
Upload Validations¶
Only xlsx file allowed
Ensure you’re uploading a file with the .xlsx extension.
Invalid Excel Template. Sheet names are not correct.
Ensure the sheet names are Nodes and Paths. See Understand the Template Structure.
Invalid Excel Template. Column headers of nodes sheet are not correct.
Ensure the column names are correct. Don’t change any of the column names. They must be the same as in the template you downloaded from Alation. See Understand the Template Structure. Make sure that no additional columns have been added and none have been removed from the template.
Invalid Excel File. Max rows limit 200 exceeded
You can’t include more than 200 Row IDs per sheet (default). See Constraints.
Job Validations¶
Row ID ‘abc’ should be a number
Ensure the values in the Row ID columns on both the sheets are unique integers.
Path Row ID ‘abc’ should be unique
Ensure the values in the Row ID columns on both the sheets are unique integers.
External ID ‘abc’ is invalid
The fully qualified name is incorrect. Check the unique identifier included into the fully qualified name.
Source Row ID ‘abc’ is invalid
Ensure the values in the Row ID columns on both the sheets are unique integers.
Source Node ID <N> doesn’t exist in Nodes sheet
Ensure that this Source Row ID exists on the Nodes sheet.
Source Row ID <N> has error in Nodes Sheet
Ensure that the Row ID is a unique integer.
Target Row ID ‘abc’ is invalid
Ensure the values in the Row ID columns on both the sheets are unique integers.
Target Node ID <N> doesn’t exist in Nodes Sheet
Ensure that the Target Row ID exists on the Nodes sheet.
Target Row ID <N> has error in Nodes Sheet
Ensure that the Row ID is a unique integer.