Versions 2.2.0 or Newer

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Query log ingestion (QLI) extracts and ingests the query history of a database and powers the lineage, popularity, top user, join and filter information in the catalog. You’ll be able to explore examples of ingested queries on schema and table catalog pages.

Important

Query Log Ingestion (QLI) is available from connector version 2.0.2 or newer.

You can configure query log ingestion (QLI) on the Query Log Ingestion tab of the Settings page. QLI can be set up in one of the following ways:

  • Default QLI: Uses a predefined query on the system audit log table to extract query history. No custom SQL is required.

  • Custom Query-Based QLI: Allows you to specify a custom SQL query to extract query history from a table or view of your choice.

Prerequisites for Default QLI

Note

System Audit Log Table is available in Databricks Unity Catalog on Azure, AWS, and GCP. The system audit log table is a system table that contains the audit logs of all user activities in the Databricks workspace. It is used by Alation to extract query history for QLI.

Default QLI uses the system audit log table to extract query history. Before you can perform QLI from your data source in Alation, you will need to:

  1. Enable the system.access schema:

  2. Configure audit log delivery on your Databricks account and workspace:

  3. Enable verbose audit logging to capture additional events in the audit log table. Your logs will include the following additional events:

    • For notebooks: runCommand

    • For SQL warehouse: commandStart, commandFinish

    Alation requires these events to be available in the audit table to successfully extract and ingest query history:

  4. Grant Permissions for QLI to the Alation service account.

  5. In Alation, ensure that you run metadata extraction (MDE) before running QLI. For information on MDE, refer to Configure Metadata Extraction.

Default QLI Query

Alation runs the following query on the system.access.audit table to retrieve query history:

SELECT request_event_time AS startAt,
  username,
  queryText,
  TIMESTAMPDIFF(SECOND, request_event_time, response_event_time) AS executionTime,
  statusCode,
  errorMessage,
  session_id
FROM (
  SELECT
    user_identity.email AS username,
    request_params.commandText AS queryText,
    event_time AS request_event_time,
    request_params.commandId
  FROM system.access.audit
  WHERE action_name = 'commandSubmit'
  AND request_params.commandText IS NOT NULL
  AND request_params.commandText <>  ''
  ) submitTable
  LEFT JOIN (
    SELECT
      response.status_code AS statusCode,
      response.error_message AS errorMessage,
      session_id,
      event_time AS response_event_time,
      request_params.commandId
    FROM system.access.audit
    WHERE action_name = 'commandFinish'
    ) finishTable
WHERE submitTable.commandId = finishTable.commandId
  AND submitTable.request_event_time >= "FROM_TIME"
  AND submitTable.request_event_time <="TO_TIME"
  AND finishTable.statusCode = 200
  AND finishTable.errorMessage IS NULL
UNION ALL
SELECT
  event_time AS startAt,
  user_identity.email AS username,
  request_params.commandText AS queryText,
  round(request_params.executionTime,0) AS executionTime,
  response.status_code AS statusCode,
  response.error_message AS errorMessage,
  session_id
FROM system.access.audit
WHERE action_name in ('runCommand')
  AND event_time >= "FROM_TIME"
  AND event_time <= "TO_TIME"
  AND request_params.commandText IS NOT NULL
  AND request_params.commandText <>  ''
  AND response.status_code = 200
  AND response.error_message IS NULL;

Custom Query-Based QLI

Important

Custom query-based QLI is supported from connector version 3.4.0 and Alation version 2023.3.4 or newer.

When you configure a custom query, it overrides the default QLI query.

To use custom query-based QLI, perform these steps:

  1. In the Unity Databricks data source, open the Query Log Ingestion tab on the Settings page.

  2. Go to the Custom Query-Based QLI section.

  3. Enter your custom SQL query in the Custom QLI Query field.

Ensure that the service account has the required permissions to select from the relevant system table or view for the custom query to succeed.

Databricks stores query logs in the following system tables.

  1. Audit Logs Table: system.access.audit

  2. Query History Table: system.query.history

Audit Logs Table vs. Query History Table

Audit Logs Table

Query History Table

Scope

Records all user activities, including query execution, file access, and permission changes.

Logs queries executed via SQL warehouses or serverless compute for notebooks/jobs.

Setup Complexity

Requires more setup. See Prerequisites for Default QLI.

Easier to set up.

Security

Includes audit logs and user login and logout details.

Does not include user login/logout or other sensitive audit log details.

Below are the templates for the queries that you can use in the Custom QLI Query field. You can modify these templates to suit your requirements, including adjusting filters as needed.

Custom-Query Template

Note

Ensure that the service account has the necessary permissions to access the system.access.audit table. See Prerequisites for Default QLI.

For more information on Databricks audit logs, refer to https://docs.databricks.com/aws/en/admin/system-tables/audit-logs

The following query retrieves query history from the system.access.audit table:

SELECT userName,
      '' AS defaultDatabases,
      queryString,
      session_id AS sessionID,
      request_event_time AS startTime,
      request_event_time AS sessionStartTime,
      timestampdiff(SECOND, request_event_time, response_event_time) AS seconds,
      false AS cancelled
FROM (
    SELECT user_identity.email AS userName,
          request_params.commandText AS queryString,
          event_time AS request_event_time,
          request_params.commandId
    FROM system.access.audit
    WHERE action_name = 'commandSubmit'
      AND request_params.commandText IS NOT NULL
      AND request_params.commandText <> ''
) submitTable
LEFT JOIN (
    SELECT response.status_code AS statusCode,
          response.error_message AS errorMessage,
          session_id,
          event_time AS response_event_time,
          request_params.commandId
    FROM system.access.audit
    WHERE action_name = 'commandFinish'
) finishTable
  ON submitTable.commandId = finishTable.commandId
WHERE submitTable.request_event_time >= STARTTIME
  AND submitTable.request_event_time <= ENDTIME
  AND finishTable.statusCode = 200
  AND finishTable.errorMessage IS NULL

UNION ALL

SELECT user_identity.email AS userName,
      '' AS defaultDatabases,
      request_params.commandText AS queryString,
      session_id AS sessionID,
      event_time AS startTime,
      event_time AS sessionStartTime,
      ROUND(request_params.executionTime, 0) AS seconds,
      false AS cancelled
FROM system.access.audit
WHERE action_name IN ('runCommand')
  AND request_params.commandText IS NOT NULL
  AND request_params.commandText <> ''
  AND event_time >= STARTTIME
  AND event_time <= ENDTIME
  AND response.status_code = 200
  AND response.error_message IS NULL;

Performing QLI

You can either run QLI manually on demand or configure it to run automatically on a schedule.

Note

Alation supports a maximum SQL query length of 100K rows. There is also a 50 KB limitation on query file size. Queries with more than 100K rows or queries larger than 50 KB cannot be parsed and ingested by QLI jobs.

The steps involved in configuring and running QLI are:

Use Custom Query for QLI (Optional)

Perform these steps to configure custom query-based QLI:

  1. In the Unity Databricks data source, open the Query Log Ingestion tab on the Settings page.

  2. Under Step 1: Use Custom Query for QLI (optional), enter your custom SQL query in the Custom QLI Query field and save your changes.

Note

Your custom query for QLI must meet the following requirements:

  • Returns the following columns with the exact aliases: userName, startTime, queryString, defaultDatabases, sessionID, sessionStartTime, cancelled, and seconds.

  • Uses STARTTIME and ENDTIME as filter parameters in the WHERE clause.

  • Allows the service account to execute the query with the required permissions.

Test Access and Query History Size

Before you run QLI, you can validate that the user has access to the QLI tables and estimate the approximate size of the query history metadata. The size is estimated based on the average query volume of the last seven days.

To test the access and query history size:

Under the section of the user interface Step 2: Test access and query history size, click Test. A dialog box appears displaying the test progress. Upon successful validation, you’ll get a confirmation that the user has the required permissions to continue.

Preview Results (Optional)

Before running QLI, you can preview the queries to be ingested.

To preview queries:

  1. Under the section Step 3: Preview Results (optional), enter the date range for which you want to generate a preview.

  2. Click Preview.

  3. Click View Results to view the generated report.

  4. You can download the preview as a JSON file. Click Download Preview Results in the preview window to download it.

Run QLI

You can either run QLI manually on demand or configure it to run automatically on a schedule:

Run QLI Manually

To run QLI on demand:

  1. Under the section Step 4: Run QLI, specify the desired date range using the Date Range calendar widgets. You will need to specify the start date and the end date of the date range separately.

  2. Click Import. This will initiate a QLI job.

  3. You can monitor the status and history of your QLI jobs on the QLI Job History page. To open the QLI job history, click Go to QLI Job History.

Find more details about the QLI job history in View the QLI Job Status.

Schedule QLI

You can configure QLI to run automatically on a schedule that you select.

To schedule QLI:

  1. Under the section Step 3: Run QLI, enable the Enable QLI Schedule toggle.

  2. Specify values for the job recurrence and time. The values are set in your local time. The next QLI job for your data source will run on the schedule you have specified.

View the QLI Job Status

On the QLI Job History sub-tab, view the status of your QLI jobs.

To open the list of QLI jobs that were run on the data source, click the QLI Job History sub-tab on top of the page. The QLI jobs are displayed in the query log ingestion job status table.

A QLI job can have one of these statuses:

  • Succeeded—The job was successful.

  • Partial Success—The job was successful, but there are some warnings. For example, if Alation fails to ingest some objects, it skips them and proceeds to ingest other objects, with the QLI job resulting in partial success.

  • Failed—The job failed due to errors.

For each of the QLI jobs listed in the ingestion job status table, you can view the detailed log messages. Click the status link or the View Details link to open a detailed report in a pop-up info box. If there were errors during QLI, the corresponding error messages are displayed in the Job errors table. Follow the instructions in the Hint column for troubleshooting tips.