Configure Cross-Source Lineage

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Alation can build and display lineage that traces data movement across sources in the catalog. Such lineage is known as cross-source lineage. Proper configuration of cross-source lineage allows Alation to accurately identify objects related through lineage and prevents the creation of temporary nodes on lineage charts.

For example, if you have cataloged an RDBMS database as a data source and a BI server that uses data from this database as a BI source, lineage charts for objects under these sources can be configured to show lineage across both of them. As a result, you’ll be able to trace BI dashboards, reports, and report fields to the relevant RDBMS tables and columns in the upstream lineage.

Another example is lineage between data sources. You can catalog several databases from the same data warehouse as different data sources. Tables and views in these databases may be related through lineage, and configuring cross-source lineage will surface those lineage paths on lineage charts.

Configure Cross-Source Lineage Between BI and Data Sources

You have two options to configure cross-source lineage between a BI source and a data source that depend on whether you have the host and port information for the BI data sources:

Find the BI Connection Information

To find the BI connection information:

  1. Open the catalog page of the specific BI source. This is the BI source that uses the data from the data source you want to establish lineage to.

  2. Find the connection string for the BI data source as it appears in the BI source metadata. There are two ways to locate this information. Both ways will lead you to the BI datasource connection catalog object that stores the necessary connection string. The screenshot below shows an example of a BI data source connection catalog page and the location of the connection string you need.

    ../../_images/OCF_BIConnectionInfo_Example.png

To locate the connection information you can Start on the BI Source Catalog Page or Start on the BI Object Page.

Start on the BI Source Catalog Page

  1. Open the catalog page of the BI source.

  2. On the BI source page, click the DataSources tab or the DataSet tab, depending on which tab name you find under your BI source. This tab lists all BI data source connections from all BI objects under this BI source. Depending on how much BI metadata you have in the catalog, the list can be quite long. If you know the name of the connection you are looking for, use the Filter on top right of the data sources or datasets list to find the object.

    ../../_images/OCF_BIConnectionInfo_UseFilter.png
  3. After locating the BI data source or dataset, in the Name column, click its name. The corresponding catalog page will open.

  4. Click the Connections tab. This tab lists all data source connections under a BI data source object.

  5. On the right, under the Properties section, locate the field Database Connection. This field will contain the connection string that you need.

  6. If the string exists and includes the host and port information, you can go with the option Configure Lineage When the BI Connection Host and Port Is Known. If the string doesn’t exist or the host and port information isn’t available, use the option Configure Lineage When the BI Connection Host and Port Isn’t Known.

Start on the BI Object Page

  1. Open the catalog page of the BI source.

  2. Drill down to the BI object that uses the connection you’re looking for. For each BI source type, the hierarchy of BI objects will differ. Start with a site or folder, and click through to the level that stores BI data source connections (such as project, workbook, report, or dashboard).

  3. On the BI object page, click the DataSources or DataSet tab.

    ../../_images/OCF_BIConnectionInfo_Object_Page.png
  4. The DataSources or DataSet tab will display the list of BI data sources used to create the BI report(s).

    ../../_images/OCF_BIConnectionInfo_DataSourcesList.png
  5. Click the name of the data source you need. The corresponding catalog page will open.

  6. Click the Connections tab. This tab lists all connections under a BI data source.

  7. On the right, under the Properties section, locate the field Database Connection. This field will contain the connection string that you need.

  8. If the string exists and includes the host and port information, you can go with the option Configure Lineage When the BI Connection Host and Port Is Known. If the string doesn’t exist or the host and port information isn’t available, use the option Configure Lineage When the BI Connection Host and Port Isn’t Known.

Configure Lineage When the BI Connection Host and Port Is Known

To configure cross-source lineage between a BI source and a data source when the BI data source host and port is known, you need to correctly fill out the Additional datasource connections field (or the BI Connection info field) on the General Settings tab on the settings page of the data source to which you’re establishing lineage. This field is used by the lineage service logic to identify lineage nodes that are cataloged in different sources.

Note

For some OCF connectors, the Additional datasource connections field appears as the BI Connection info field in the user interface. This is one and the same field that serves the same purpose.

To configure cross-source lineage:

  1. Copy the database connection information from the BI data source connection page. (See Find the BI Connection Information.)

  2. Open the settings of the corresponding data source and navigate to the General Settings tab.

  3. Paste the connection information into the Additional datasource connections field (or BI Connection info).

    Note

    In certain situations, the BI data source connection value may only contain the host information but no port value. However, a port is required as part of the BI Connection info or Additional datasource connections field value.

    If this is your case, add a random integer as a port number for the value to be valid. The value -1 is also acceptable. For example:

    • toro-demo.ceadil4offf2.us-west-2.rds.amazonaws.com:123

    • toro-demo.ceadil4offf2.us-west-2.rds.amazonaws.com:-1

  4. Click Save to save the value.

The next metadata extraction (MDE) will incorporate this configuration. Lineage will be generated with a sub-job during the MDE.

Configure Lineage When the BI Connection Host and Port Isn’t Known

If your BI server doesn’t provide the host and port information for BI data sources, you configure cross-source lineage in both the BI source and the data source.

Lineage can be generated during the next metadata extraction or as a separate action after you’ve performed this configuration.

Step 1: Configure the Additional Datasource Connections Field

You start with configuring the Additional datasource connections field in the settings of the RDBMS data sources that provide data for this BI source:

  1. Open the General Settings tab of the data source settings.

  2. Copy the host and port information from the URI field in the format host:port, for example: toro-demo.ceadil4offf2.us-west-2.rds.amazonaws.com:123.

  3. Locate the Additional datasource connections field:

    • For some data sources, it’s located below Step 3: Test connection in the Advanced settings (optional) section of the page.

      ../../_images/ConfigureLineage_LocateFieldNew.png
    • For some data sources, it’s located on top of the General Settings tab, under Application settings. In some connector versions, the field’s name is BI connection info.

      ../../_images/ConfigureLineage_LocateFieldOld.png
  4. Paste the host and port information into the Additional datasource connections field.

  5. Click Save under the field to save the value.

Step 2: Add Upstream Sources

Next, you switch to the BI source and define upstream data sources that provide the data for it. During extraction, Alation will automatically match the BI objects in this BI server with the upstream tables and views in the data sources you select.

To add upstream sources:

  1. Open the Lineage Settings tab of the BI source you’re configuring lineage for.

  2. Make sure the Lineage Configuration sub-tab is selected.

    ../../_images/ConfigureLineageSettingsTab.png
  3. Click Add Upstream Connections. The Add Upstream Connections dialog opens.

    ../../_images/ConfigureLineageAddUpstreamConn.png

    In this dialog:

    • The Target list is disabled and displays the name of the BI source you’re configuring.

    • The Sources list can be used to select data sources.

  4. Click the Sources list to expand it. In the drop list that opens, select the checkboxes of the upstream data sources that provide data for this BI source. You can select multiple data sources. You can use the Search box on top of the list to search and find the data sources you need.

    ../../_images/ConfigureLineageSelectSources.png

    Note

    Data sources must have been cataloged in Alation to appear as selectable options in this configuration.

    The Sources list displays a maximum of 1,000 data sources that are accessible to the user. If you don’t find the data source you want, check that you have access to this data source.

  5. After making your selections, click outside of the list to close it.

  6. Click Save in the dialog. The sources you selected appear in the Sources list under Lineage Configuration. The Generate Lineage button on top right becomes active.

    ../../_images/ConfigureLineageAddedSourcesList.png

    Note

    You can adjust your lineage settings configuration anytime by updating the list of sources.

You can generate lineage next. You can also do it later as needed. The next metadata extraction on this BI source will also incorporate this lineage configuration and generate the corresponding lineage with a sub-job.

Step 3: Generate Lineage

  1. Ensure you have at least one successful metadata extraction job from the BI server. If you haven’t, run extraction instead. The current configuration of the Lineage Settings tab will be applied during extraction.

  2. To generate lineage separately from metadata extraction, click the Generate Lineage button on top right. In the confirmation dialog that opens, click Continue. This will initiate a lineage calculation job where Alation matches the BI data souces with the data sources you’ve selected on the Lineage Settings tab and generates cross-system lineage between the BI objects and the data sources.

  3. After you confirm lineage extraction, Alation displays the associated Job ID at the bottom of the screen. Copy this Job ID and save it to a file. You may need it to troubleshoot your configuration or to pass it to Alation Support for troubleshooting.

    ../../_images/ConfigureLineageJobID.png

You can view the status of the lineage job on the Lineage Job History sub-tab.

Step 4: View the Lineage Job Status

  1. Open the Lineage Job History sub-tab to view a list of the Generate Lineage jobs.

    ../../_images/ConfigureLineageViewJob.png
  2. For the job you want to view, in the Lineage Job Status table, click on Status or View Details. This will bring up a pop-up with additional information about the job.

    ../../_images/ConfigureLineageJobDetails.png

Manage Lineage Settings

You can adjust lineage configuration anytime by changing or removing sources.

Change Sources

A source in the Upstream Connections table can be changed to another if needed:

  1. Click the Change link for the source you want to change to another source.

  2. This brings up the Change connection dialog. The source that you clicked the Change link for will be preselected in the Sources list.

  3. From the Sources list, select a different source.

  4. Click Save. The previous source will be substituted with the new source you just selected.

Remove Sources

You can remove one or all sources.

Note

This action doesn’t delete any previously populated lineage. To update your lineage information after removing sources, generate lineage or re-run metadata extraction.

To remove one source:

  1. In the Sources table, click the Remove link for the source you want to remove.

  2. In the confirmation dialog that pops up, click Continue. The source will be removed from the configuration.

To remove all sources:

  1. Click on the three dots icon on top right.

  2. In the menu that opens, click Remove All Connections.

  3. This brings up a confirmation dialog. Click Continue. All sources will be removed from the configuration.

Enable Cross-Source Lineage Between Data Sources

Available from version 2023.1.6

Starting with version 2024.1.5, cross-source lineage for data sources is enabled by default. Use the information in Configure Cross-Source Lineage Between Data Sources to configure lineage.

Configure Cross-Source Lineage Between Data Sources

To configure cross-source lineage between data sources:

  1. Provide the host and port information of the data source in the Additional datasource connections field (BI Connection info field). For example, if the URI of the relevant data source is snowflake://dbhost.com:443/?warehouse=TEST, then the information you need for the field is dbhost.com:443. Do this for all data sources that may have objects connected through lineage.

    Note

    Cross-source lineage between data sources has a dedicated alation_conf parameter. When this parameter is set to False, you will see temporary lineage nodes on lineage charts even if you configure the Additional datasource connections field (BI Connection info field) correctly. Learn more in Enable Cross-Source Lineage Between Data Sources.

  2. Run metadata extraction on all data sources you’ve configured cross-source lineage for.

  3. Run query log ingestion.

As a result of configuring the Additional datasource connections field (BI Connection info field) and after running metadata extraction and query log ingestion, you should see lineage paths between data sources. If you previously saw them as temporary (TMP), you should see the TMP badge removed. If you still see the TMP badge, refer to Troubleshooting Cross-Source Lineage Between Data Sources for troubleshooting advice.

View Cross-Source Lineage

As a result of configuring cross-source lineage, you will see lineage links across the relevant sources in your catalog.

The screenshot below shows lineage for a BI report that includes the upstream lineage links to the corresponding data source, where:

  • 1—The BI object catalog page

  • 2—This BI object represented on the Lineage diagram under the Lineage tab

  • 3—The upstream lineage that shows the source of data and comes from the data source in the catalog where the BI Connection info field stores the corresponding BI dataset connection information

    ../../_images/OCF_BIConnectionInfo_Chart.png

    Note

    The schema that was used in the BI data source or dataset must be cataloged in Alation. Excluding the required schema from extraction will create a lineage object with the TMP tag (“temporary object”) on the lineage chart.

Troubleshoot Cross-Source Lineage

Troubleshoot Cross-Source Lineage Between BI Sources and Data Sources

Problem

Tables and views in the upstream lineage for a BI source have the temporary (TMP) badge although the BI Connection info or Additional datasource connections field on the relevant data source has been configured correctly.

Solution

The schemas which were used to create the BI data source or dataset must be cataloged in Alation under the data source. If a source object was excluded from extraction, the lineage generation process will create a lineage node with the TMP badgeon lineage charts. Check that the data objects, such as tables and views, represented on lineage charts as TMP nodes have been cataloged in Alation. If not, extract them from the data source and then rerun extraction on the BI source.

Troubleshooting Cross-Source Lineage Between Data Sources

Temporary Lineage Nodes Appear on Lineage Charts

Problem

The BI Connection info or Additional datasource connections field has been configured correctly, but temporary (TMP) lineage nodes still appear on lineage charts for table and view objects.

Solution

Cross-source lineage between data sources has a dedicated alation_conf feature flag that is disabled by default. You will see temporary lineage nodes on lineage charts even if you configure the BI Connection info or Additional datasource connections field correctly. Ensure that the feature flag is set to True in your catalog instance. After enabling the flag, rerun metadata extraction and query log ingestion.

On how to enable the feature, see Enable Cross-Source Lineage Between Data Sources.

Temporary Lineage Nodes Appear for View Objects

Problem

The cross-source lineage feature flag is enabled, the data BI Connection info or Additional datasource connections field has been configured correctly, but the temporary (TMP) badges still appear on lineage charts of view objects. The TMP badges do not appear on lineage nodes of table objects.

Solution

Re-parse the lineage information for views for each data source. This action regenerates lineage for views using the configuration of the Additional datasource connections field (the BI Connection info field).

Note

Re-parsing is done on the backend of the Alation server. Alation Cloud Service admins can request assistance from Alation Support.

To re-parse the lineage for views:

  1. Find out IDs of all data sources that you’ve configured cross-source lineage for. Learn more in How to Find Data Source ID. Have the IDs at the ready.

  2. Use SSH to connect to the Alation server.

  3. Enter the Alation shell using the following command:

    sudo /etc/init.d/alation shell
    
  4. Switch the user to alation.

    sudo su alation
    
  5. Run the following code, substituting the placeholder <comma_separated_data_source_ids> with comma-separated data source IDs.

    python -m reparse_view_sql --ds_ids <comma_separated_data_source_ids>
    

    Example:

    python -m reparse_view_sql --ds_ids 1,2
    
  6. Exit from the alation user and Alation shell by using the exit command twice.

After you re-parse the views, the TMP badges should be removed from view objects on lineage charts.