Version 1.7.0 or Newer

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Important

This section is applicable for Alation version 2023.3.4 or higher and Amazon Redshift OCF connector version 1.7.0 or higher.

You can either create a table for Alation to pull the query logs from or use a custom query to query the logs from the Amazon Redshift data source.

Before performing query log ingestion (QLI), perform the QLI setup. Refer to Amazon Redshift Query Log Table.

Note

Alation doesn’t ingest query statements with more than 100,000 characters. If the character count of a query exceeds this limit, the query string will be automatically truncated to 100,000 characters before it is ingested into the catalog.

Query log ingestion (QLI) extracts and ingests the query history of a database and powers the lineage, popularity, top user, and join and filter information in the catalog. Explore examples of ingested queries on schema and table catalog pages.

The steps involved in configuring and running QLI are:

For data sources added using the Amazon Redshift OCF connector, Alation supports the following QLI configuration options:

  • Table-Based

  • Custom Query-Based

Based on the option you choose, you are required to either create a QLI table in your database or write a query using the template expected by Alation.

Configure Table-Based QLI

To configure a table-based QLI, perform these steps:

  • Create a table

  • Provide the QLI table name in Alation

Create a Table

To enable QLI in Alation, in a schema of your choice, create a table using the template below. Grant the service account you are using in Alation the SELECT permissions for this QLI table.

For more information, see Amazon Redshift Query Log Table.

Provide the QLI Table Name

Important

The Alation user interface displays standard configuration settings for credentials and connection information stored in the Alation database. If your organization has configured Azure KeyVault or AWS Secrets Manager to hold such information, the user interface will change to include the following buttons adjacent to the respective fields:

../../../../_images/SnowflakeOCF_New_Vault_Button.png

By default, you see the user interface for Standard. In the case of Vault, instead of the actual credential information, you must select the source and provide the corresponding key. For details, see Configure Secrets for OCF Connector Settings.

  1. On the Settings page of your Amazon Redshift data source, go to the Query Log Ingestion.

  2. Under the Provide the QLI table name section, enter the QLI table name in the Table name field.

    Ensure that the service account has the permissions to access the Table.

    Important

    Use the format schema_name.table_name.

  1. Click Save.

Configure Custom Query-Based QLI

The custom query takes the precedence over QLI table name.

Redshift uses a case-insensitive database, PostgreSQL. For case-sensitivity use quotes around the column names.

To configure custom query-based QLI, you must:

  • Use the QLI query template to create a query structure

  • Provide the custom query in Alation

  • Verify that the service account has privileges to execute this query.

Use the QLI Query Template

The template for the QLI query is given below. You can customize it by adding, removing, or changing the filter, but the columns and their aliases must remain as is since Alation expects this query structure.

Note

When using the QLI query template, do not substitute the STARTTIME and ENDTIME parameters in the WHERE filter. These are mandatory parameters. These parameters are not actual column names and should stay as is. They are expected by the connector and will be substituted with the start and end date of the QLI range selected in the user interface when QLI is run manually or on schedule.

Refer to the following template showing the manadatory column names for QLI and the queries that are fetched from a table:

SELECT
    user_name AS username,
    to_char(start_time, 'YYYY-MM-DD HH:MI:SS.US') AS startTime,
    text AS queryString,
    session_id AS sessionId,
    seconds_taken AS seconds,
    'false' AS cancelled,
    default_database AS defaultDatabases,
    split_part(session_id, '/', 2) AS sessionStartTime,
    seq
FROM
    public.alation_qlog
WHERE
    starttime BETWEEN STARTTIME AND ENDTIME
    AND queryString IS NOT NULL
    AND queryString <> ''
    AND username != 'rdsdb'
    AND canceled = ''
ORDER BY startTime, username, seq;

Provide the Custom Query

  1. On the Amazon Redshift data source, go to the Query Log Ingestion tab of the Settings page.

  2. Under the Provide the QLI table name section, go to Alternatively, use a custom SQL query.

  3. In the Custom QLI Query field, provide a custom query to retrieve the query history.

    Note

    The custom query takes precedence over QLI table name.

  4. Click Save.

Example: Custom Query to Directly Fetch Queries

SELECT
cast(sqt.text AS VARCHAR) as queryString,
sqt.sequence as seq,
CASE
    when sqh.status in ('failed' , 'canceled') then true
    else false
end as cancelled,
datediff (millisecond, sqh.start_time, sqh.end_time) / 1000.0 as seconds,
cast(pui.usename AS VARCHAR) as userName,
cast(ssh.database_name AS VARCHAR) as defaultDatabases,
cast(sqh.session_id AS VARCHAR) as sessionId,
ssh.start_time as sessionStartTime,
sqt.start_time as startTime
FROM
    sys_query_history sqh
    JOIN
    sys_query_text sqt ON sqh.query_id=sqt.query_id
    JOIN
    pg_user_info pui ON pui.usesysid=sqh.user_id
    JOIN
    sys_session_history ssh ON ssh.session_id=sqh.session_id
WHERE
    sqh.user_id > 1 AND
    sqt.start_time between STARTTIME AND ENDTIME
ORDER BY startTime,userName,seq;

sqh.user_id>1 filters out system generated queries.

Test the Access and Find the Query History Size

Before you perform the QLI, you must validate that the service account has access to the QLI table and gauge the approximate size of the query history metadata. The size is estimated based on the average query volume of the last 7 days.

To test the access and find out the approximate size of the query history metadata, perform these steps:

  1. On the Settings page of your Amazon Redshift data source, go to the Query Log Ingestion tab.

  2. Under the Test access and query history size section, click Test.

    A dialog box appears displaying the access validation result and upon successful validation, the size of the query history is displayed. The size is estimated based on the query volume of the last 7 days.

    Note

    If the average query size for the last seven days exceeds 500k, a warning message indicates the same. In such cases, Alation recommends that you schedule the QLI job to run daily.

Preview Results

Before performing the QLI, perform these steps to preview the queries:

  1. On the Settings page of your Amazon Redshift data source, click go to the Query Log Ingestion tab.

  2. Under the Preview Results section, enter the date range for which you want to generate the preview of the query history.

  3. Click Preview.

  4. Click View Results to view the generated preview.

    The Preview dialog appears displaying the total number of query statements per user under the User Queries tab and a detailed query statement under the Statements tab. Click Download to download the detailed query statement as a JSON file.

Note

You can use this option to run default QLI using just the date-range.

Run QLI

You can either run QLI manually on demand or configure it to run automatically on a schedule.

Run QLI Manually

To perform QLI manually on demand:

  1. On the Settings page of your Amazon Redshift data source, go to the Query Log Ingestion tab.

  2. Under the Run QLI section, turn off the Enable QLI Schedule toggle.

  3. Specify the desired date range using the Date Range calendar widgets. You will need to specify the start date and the end date separately.

  4. Click Import.

    A query log ingestion job is initiated.

Schedule QLI

  1. On the Settings page of your Amazon Redshift data source, go to the Query Log Ingestion tab.

  2. Under the Run QLI section, turn on the Enable QLI Schedule toggle.

  3. Specify values for the job recurrence and time. The values are set in your local time.

    ../../../../_images/Snowflake_OCF_New_ScheduleQLI.png

Note

Here are some of the recommended schedules for better performance:

  • Schedule QLI to run for every 12 hours at the 30th minute of the hour

  • Schedule QLI to run for every 2 days at 11:30 PM

  • Schedule QLI to run every week on the Sunday and Wednesday of the week

  • Schedule QLI to run for every 3 months on the 15th day of the month

  1. Click Import.

The next QLI runs on the set schedule.

View the Job Status

To view the QLI job status after you run the QLI manually or after Alation triggers the QLI as per the schedule, go to Query Log Ingestion > QLI Job Status.

../../../../_images/Snowflakes_OCF_New_QLIJobHistory.png

The Query log ingestion job status table logs the following status:

  • Succeeded - Indicates that the query ingestion was successful.

  • Partial Success - Indicates that the query ingestion was successful with warnings. If Alation fails to ingest some of the objects during the QLI, it skips them and proceeds with the query ingestion, resulting in partial success. Similarly, if all the queries are ingested by a single user, QLI results in a partial success.

  • Failed - Indicates that the query ingestion failed with errors.

Click the View Details link to view a detailed report of query ingestion. Click the View Details link to view a detailed report of metadata extraction. If there are errors, the Job errors table displays the error category, error message, and a hint (ways to resolve the issue). Follow the instructions under the Hints column to resolve the error.