Version 1.9.0 or Newer

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Important

This section is applicable for Alation version 2024.1.2 or higher and Tabelau OCF connector version 1.9.0 or higher.

Overview

Metadata extraction (MDE) is the process of fetching BI source information, such as sites, projects, workbooks, views (worksheets, dashboards), fields, data sources, data source fields, databases, and tables. Alation queries your BI Server to retrieve this metadata, which becomes catalog objects.

You can initiate MDE on demand or schedule it for regular catalog updates.

Configure MDE in Alation

Steps involved in metadata extraction are:

Test Access and Fetch Projects

Before fetching the projects for extraction, Alation tests if the URI can be reached to run metadata extractions.

  1. On the Settings page of your Tableau OCF BI source, go to the Metadata Extraction tab.

  2. In the Test access and fetch projects section, click Run.

    The retrieved list of projects appears in the Projects table under the Select projects for extraction section of the Metadata Extraction page.

Select Projects For Extraction

Select projects for extraction, to which you have access, instead of extracting all the projects. When selecting projects for extraction, you retrieve the metadata only for the selected projects. This makes the extraction quicker and consumes fewer resources than extracting all the projects.

By default, all the projects Alation fetches from the BI source will be selected for extraction. You can adjust the selection of by:

  • Selecting Projects using Filters

  • Selecting Projects Manually

If you do not select any project manually or using rules, Alation extracts all the projects upon running the metadata extraction.

Select Projects using Filters

If you want to apply extraction filters, perform these steps:

  1. On the Settings page of your Tableua OCF connector, go to the Metadata Extraction tab.

  2. Under the Select projects for extraction section, turn on the Enable advanced settings toggle.

  3. Select the required extraction filter option from the Extract drop down:

    • Only selected projects — extracts metadata only from the selected projects. This is the default value.

    • All projects except selected — extracts metadata from all projects except the selected projects.

  4. To soft-delete the projects from previous extraction that are not part of the current project selection, select the Keep the catalog synchronized with the current selection of projects checkbox.

  5. Create a filter.

    1. From the first drop down, select Projects.

    2. Select the filter criteria (Contains, Starts with, Ends with, Regex).

    3. Specify the keyword to look for from the projects.

      Use this option if you frequently change projects or if you use extensive metadata.

      You can add multiple filters by clicking the Add another filter link.

    Note

    You must use rules if you plan to schedule MDE.

  6. Click Apply filters.

    The Project table displays the selected projects that match the rules that you had set.

Note

After applying rules, you cannot manually adjust the selection of projects.

Select Projects Manually

If you opt to manually select the projects for extraction, perform these steps:

  1. On the Settings page of your Tableau BI source, go to the Metadata Extraction tab.

  2. Under the Select projects for extraction section, turn off the Enable advanced settings toggle if not disabled already.

  3. Select the required projects from the list of projects in the Projects table.

    Alternatively, you can select projects by searching for the required project from the table using either the project name or any keyword or string in the project name.

    After you have selected the projects, your project selection count is displayed above the Project table.

Permission Mirroring (Optional)

You can enable or disable permission mirroring feature to mirror the user permissions from Tableau to Alation.

Important

Enabling permission mirroring will increase time for metadata extraction significantly as Alation will invoke Tableau API to get permission for each project, workbook, datasource, and report.

Provide the domain name(s) separated by comma in the User Domain Name field for Tableau users whose permissions Alation will extract and click Save. Ensure that the username is same in Tableau and Alation.

Alation supports extraction of permissions from multiple domains. Ensure you have performed the required configuration in your Active directory.

Customize Alation Certified Project Extraction (Optional)

Define the suffix for the project certified by Alation in Certified project suffix field and click Save. If a workbook is certified in Alation, it will be moved to a new project with the name <project_name - certified project suffix>, for example, <Population Growth Analysis - Alation Certified>.

Enable Propagate Alation certifcation to BI source toggle to propagate the Endorse, Warn, and Deprecate flags from Alation to the BI source. This toggle is disabled by default.

Customize Preview Extraction (Optional)

Important

Enabling any of these settings will increase time for metadata extraction significantly as Alation will invoke Tableau API to get preview for each dashboard and report.

Enable the Enable previews toggle to extract previews of workbooks, reports, and dashboards as thumbnails. Previews for workbook will only be available if it has at least one dashboard. This toggle is disabled by default.

Enable the Enable high resolution preview toggle to extract hight resolution preview for reports and dashboards. This toggle is disabled by default.

Customize Additional Extraction Scope (Optional)

Important

Enabling report fields value sampling will increase time for metadata extraction significantly as Alation will invoke Tableau API to get sample values for each report.

Enable the Enable report fields value sampling toggle to sample the distinct values of all the report columns. This toggle is disabled by default.

Enable the Extract auto-generated embedded datasources toggle to extract auto-generated embedded datasources. This toggle is disabled by default. Tableau has embedded (unpublished datasource in Alation) datasources as part of workbooks when connecting to a published datasource. In Alation, the auto-generated embedded data sources are cataloged at the workbook level based on the information provided by the Tableau metadata API response. To prevent cataloging the auto-generated embedded data sources at the workbook level, turn off the Extract auto-generated embedded data sources toggle.

Run Extraction

Under the Run extraction section (General Settings > Metadata Extraction), click Run Extraction to extract metadata on demand.

The status of the extraction action is logged in the Extraction Job Status table under the MDE Job History tab.

Schedule Extraction

You can also schedule the extraction. To schedule the extraction, perform these steps:

  1. On the Settings page of your Tableau OCF BI source, go to the Metadata Extraction tab.

  2. Under the Run extraction section, turn on the Enable extraction schedule toggle.

  3. Using the date and time widgets, select the recurrence period and day and time for the desired MDE schedule. The next metadata extraction job for your BI source will run on the schedule you have specified.

../../../../_images/Snowflake_OCF_New_ScheduleMDE.png

… note:

Here are some of the recommended schedules for better performance:

   - Schedule extraction to run for every 12 hours at the 30th minute of the hour.

   - Schedule extraction to run for every 2 days at 11:30 PM.

   - Schedule extraction to run every week on the Sunday and Wednesday of the week.

   - Schedule extraction to run for every 3 months on the 15th day of the month.

View the MDE Job History

You can view the status of the extraction actions after you run the extraction or after Alation triggers the MDE as per the schedule. Also, you can view the status of the projects retrieved from the Test Access and Fetch Projects step.

To view the status of extraction, go to Metadata Extraction > MDE Job History on the Settings page of your Tableau BI source. The Extraction job status table is displayed.

../../../../_images/TableauOCF2.0_03.png

The Extraction job status table logs the following status:

  • Did Not Start - Indicates that the metadata extraction did not start due to configuration or other issues.

  • Succeeded - Indicates that the extraction was successful.

  • Partial Success - Indicates that the extraction was successful with warnings. If Alation fails to extract some of the objects during the metadata extraction process, it skips them and proceeds with the extraction process, resulting in partial success.

  • Failed - Indicates that the extraction failed with errors.

Click the View Details link to view a detailed report of metadata extraction. If there are errors, the Job errors table displays the error category, error message, and a hint (ways to resolve the issue). Follow the instructions under the Hints column to resolve the error.

In some cases, Generate Error Report link is displayed above the Job errors table. Click the Generate Error Report link above the Job errors table to generate an archive (.zip) containing CSV files for different error categories, such as Data and Connection errors. Click Download Error Report to download the files.

Troubleshooting (Optional)

Use the fields in this section for MDE debugging:

Enable Raw Dump or Replay

You can enable or disable the Raw Metadata Dump or Replay feature for debugging MDE. By default, this feature is disabled. We recommend enabling it for extraction debugging only. The full use of this feature requires access to the Alation server.

If Raw Metadata Dump or Replay is enabled, Alation breaks MDE into these stages:

  • “Dump” the extracted metadata into files. You can access and review the files on the Alation server to debug extraction issues before attempting to ingest the metadata into the catalog.

  • Ingest the metadata from the files into the catalog (Replay).

Both the stages are manually controlled from the user interface.

To enable the Raw Metadata Dump or Replay perform these steps:

  1. On the Settings page of your Tableau BI source, go to the Metadata Extraction > Troubleshooting: Enable raw dump or replay section.

  2. From the Enable Raw Metadata Dump or Replay dropdown list, select the Enable Raw Metadata Dump option.

  3. Click Save.

    This enables the first stage of MDE where the extracted metadata is dumped into the following files in a subdirectory within the opt/alation/site/tmp/ directory on the Alation server (inside the Alation shell):

    BIConnection.dump, BIConnectionColumn.dump, BIDataSource.dump, BIDataSourceColumn.dump, BIFolder.dump, BIReport.dump, BIReportColumn.dump, BIUser.dump, BIPermissions.dump, and NonFatalJobError.dump — in a subdirectory of the directory opt/alation/site/tmp/ on the Alation server (inside the Alation shell).

  4. Click Run extraction.

    Alation performs a raw metadata dump into files. In the Extraction job status table on the MDE Job History tab, click the View Details link to display the details of the MDE job. The log lists the location of the .dump files for the MDE job. For example: /opt/alation/site/tmp/rosemeta/170/extraction_dump/5028.

  5. Access and review the metadata dump files to intercept any potential extraction issues.

  6. From the Enable Raw Metadata Dump or Replay dropdown list, select the option Enable Ingestion Replay.

  7. Click Save.

    This enables the second stage where the metadata from the files is ingested into the Alation catalog.

  8. Click Run extraction.

    The metadata from the files are ingested into the catalog.

Project Extraction Batch Size

Specify the project extraction batch size. This parameter sets the batch size for workbook extraction. Note that although this parameter is defined for workbooks, the batch is formed based on the number of projects. In this parameter, you are setting the number of projects for which Alation will extract ALL workbooks in one extraction batch.

For example, if you set this parameter to 5, it would mean that workbooks will be extracted in several batches, each batch being all workbooks from first five projects, then all workbooks from the second five projects, and till the end of the list.

Workbook Extaction Batch Size

This parameter sets the batch size for workbooks. It defines the number of workbooks Alation will process in a single batch.

For example, a batch size 5 connector will query the reports, embedded datasources, and more for five workbooks at a time. Reducing this parameter decreases the size of data fetched during each request to Tableau.

Published Datasource Extraction Batch Size

This parameter sets the batch size for published datasources. It defines the number of published datasources Alation will process in a single batch.

For example, a batch size 5 connector will query the datasource connection, datasource fields, and more for five published datasources at a time. Reducing this parameter decreases the size of data fetched during each request to Tableau.

Datasource (Published and Unpublished)

Unpublished datasource is the embedded datasources which are created as a part of the workbook.

If we have a corresponding published datasource then connector will extract only published datasource. If user wants to extract both published and unpublished(embedded) then enable Extract auto-generated embedded datasources toggle under Customize additional extraction scope (optional) section.

Tableau has the datasource details internally and when connector run the query, Tableau will run it against the extracted data whenever user run a report in Tableau UI.

Request Time Out (Seconds)

Provide the timeout for the request sent to the Tableau.