Versions 2.2.0 or Newer¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Query log ingestion (QLI) extracts and ingests the query history of a database and powers the lineage, popularity, top user, join and filter information in the catalog. You’ll be able to explore examples of ingested queries on schema and table catalog pages.
Important
Query Log Ingestion (QLI) is available from connector version 2.0.2 or newer.
You can configure query log ingestion (QLI) on the Query Log Ingestion tab of the Settings page. QLI can be set up in one of the following ways:
Default QLI: Uses a predefined query on the system audit log table to extract query history. No custom SQL is required.
Custom Query-Based QLI: Allows you to specify a custom SQL query to extract query history from a table or view of your choice.
Prerequisites for Default QLI¶
Note
System Audit Log Table is available in Databricks Unity Catalog on Azure, AWS, and GCP. The system audit log table is a system table that contains the audit logs of all user activities in the Databricks workspace. It is used by Alation to extract query history for QLI.
Default QLI uses the system audit log table to extract query history. Before you can perform QLI from your data source in Alation, you will need to:
Enable the
system.access
schema:Configure audit log delivery on your Databricks account and workspace:
Enable verbose audit logging to capture additional events in the audit log table. Your logs will include the following additional events:
For notebooks:
runCommand
For SQL warehouse:
commandStart
,commandFinish
Alation requires these events to be available in the audit table to successfully extract and ingest query history:
Grant Permissions for QLI to the Alation service account.
In Alation, ensure that you run metadata extraction (MDE) before running QLI. For information on MDE, refer to Configure Metadata Extraction.
Default QLI Query¶
Alation runs the following query on the system.access.audit
table to retrieve query history:
SELECT request_event_time AS startAt,
username,
queryText,
TIMESTAMPDIFF(SECOND, request_event_time, response_event_time) AS executionTime,
statusCode,
errorMessage,
session_id
FROM (
SELECT
user_identity.email AS username,
request_params.commandText AS queryText,
event_time AS request_event_time,
request_params.commandId
FROM system.access.audit
WHERE action_name = 'commandSubmit'
AND request_params.commandText IS NOT NULL
AND request_params.commandText <> ''
) submitTable
LEFT JOIN (
SELECT
response.status_code AS statusCode,
response.error_message AS errorMessage,
session_id,
event_time AS response_event_time,
request_params.commandId
FROM system.access.audit
WHERE action_name = 'commandFinish'
) finishTable
WHERE submitTable.commandId = finishTable.commandId
AND submitTable.request_event_time >= "FROM_TIME"
AND submitTable.request_event_time <="TO_TIME"
AND finishTable.statusCode = 200
AND finishTable.errorMessage IS NULL
UNION ALL
SELECT
event_time AS startAt,
user_identity.email AS username,
request_params.commandText AS queryText,
round(request_params.executionTime,0) AS executionTime,
response.status_code AS statusCode,
response.error_message AS errorMessage,
session_id
FROM system.access.audit
WHERE action_name in ('runCommand')
AND event_time >= "FROM_TIME"
AND event_time <= "TO_TIME"
AND request_params.commandText IS NOT NULL
AND request_params.commandText <> ''
AND response.status_code = 200
AND response.error_message IS NULL;
Custom Query-Based QLI¶
Important
Custom query-based QLI is supported from connector version 3.4.0 and Alation version 2023.3.4 or newer.
When you configure a custom query, it overrides the default QLI query.
To use custom query-based QLI, perform these steps:
In the Unity Databricks data source, open the Query Log Ingestion tab on the Settings page.
Go to the Custom Query-Based QLI section.
Enter your custom SQL query in the Custom QLI Query field.
Ensure that the service account has the required permissions to select from the relevant system table or view for the custom query to succeed.
Databricks stores query logs in the following system tables.
Audit Logs Table: system.access.audit
Query History Table: system.query.history
Audit Logs Table vs. Query History Table¶
Audit Logs Table |
Query History Table |
|
---|---|---|
Scope |
Records all user activities, including query execution, file access, and permission changes. |
Logs queries executed via SQL warehouses or serverless compute for notebooks/jobs. |
Setup Complexity |
Requires more setup. See Prerequisites for Default QLI. |
Easier to set up. |
Security |
Includes audit logs and user login and logout details. |
Does not include user login/logout or other sensitive audit log details. |
Below are the templates for the queries that you can use in the Custom QLI Query field. You can modify these templates to suit your requirements, including adjusting filters as needed.
Custom-Query Template¶
Note
Ensure that the service account has the necessary permissions to access the system.access.audit table. See Prerequisites for Default QLI.
For more information on Databricks audit logs, refer to https://docs.databricks.com/aws/en/admin/system-tables/audit-logs
The following query retrieves query history from the system.access.audit table:
SELECT userName,
'' AS defaultDatabases,
queryString,
session_id AS sessionID,
request_event_time AS startTime,
request_event_time AS sessionStartTime,
timestampdiff(SECOND, request_event_time, response_event_time) AS seconds,
false AS cancelled
FROM (
SELECT user_identity.email AS userName,
request_params.commandText AS queryString,
event_time AS request_event_time,
request_params.commandId
FROM system.access.audit
WHERE action_name = 'commandSubmit'
AND request_params.commandText IS NOT NULL
AND request_params.commandText <> ''
) submitTable
LEFT JOIN (
SELECT response.status_code AS statusCode,
response.error_message AS errorMessage,
session_id,
event_time AS response_event_time,
request_params.commandId
FROM system.access.audit
WHERE action_name = 'commandFinish'
) finishTable
ON submitTable.commandId = finishTable.commandId
WHERE submitTable.request_event_time >= STARTTIME
AND submitTable.request_event_time <= ENDTIME
AND finishTable.statusCode = 200
AND finishTable.errorMessage IS NULL
UNION ALL
SELECT user_identity.email AS userName,
'' AS defaultDatabases,
request_params.commandText AS queryString,
session_id AS sessionID,
event_time AS startTime,
event_time AS sessionStartTime,
ROUND(request_params.executionTime, 0) AS seconds,
false AS cancelled
FROM system.access.audit
WHERE action_name IN ('runCommand')
AND request_params.commandText IS NOT NULL
AND request_params.commandText <> ''
AND event_time >= STARTTIME
AND event_time <= ENDTIME
AND response.status_code = 200
AND response.error_message IS NULL;
Note
Ensure that the service account has the necessary permissions to access the system.query.history table.
For more information on Databricks Query history, refer to https://docs.databricks.com/aws/en/admin/system-tables/query-history
The system.query.history table is available in Databricks Unity Catalog and contains query execution history.
The following query retrieves query history from the system.query.history table:
SELECT
executed_by AS userName,
'' AS defaultDatabases,
statement_text AS queryString,
session_id AS sessionID,
execution_duration_ms / 1000 AS seconds,
start_time AS startTime,
start_time AS sessionStartTime,
false AS cancelled
FROM system.query.history
WHERE statement_text IS NOT NULL
AND statement_text <> ''
AND execution_status = 'FINISHED'
AND statement_type NOT IN ('GRANT', 'REVOKE', 'SHOW', 'USE', 'DESCRIBE', 'COMMIT', 'OPTIMIZE')
AND start_time >= STARTTIME
AND start_time <= ENDTIME;
Performing QLI¶
You can either run QLI manually on demand or configure it to run automatically on a schedule.
Note
Alation supports a maximum SQL query length of 100K rows. There is also a 50 KB limitation on query file size. Queries with more than 100K rows or queries larger than 50 KB cannot be parsed and ingested by QLI jobs.
The steps involved in configuring and running QLI are:
Use Custom Query for QLI (Optional)¶
Perform these steps to configure custom query-based QLI:
In the Unity Databricks data source, open the Query Log Ingestion tab on the Settings page.
Under Step 1: Use Custom Query for QLI (optional), enter your custom SQL query in the Custom QLI Query field and save your changes.
Note
Your custom query for QLI must meet the following requirements:
Returns the following columns with the exact aliases:
userName
,startTime
,queryString
,defaultDatabases
,sessionID
,sessionStartTime
,cancelled
, andseconds
.Uses
STARTTIME
andENDTIME
as filter parameters in the WHERE clause.Allows the service account to execute the query with the required permissions.
Test Access and Query History Size¶
Before you run QLI, you can validate that the user has access to the QLI tables and estimate the approximate size of the query history metadata. The size is estimated based on the average query volume of the last seven days.
To test the access and query history size:
Under the section of the user interface Step 2: Test access and query history size, click Test. A dialog box appears displaying the test progress. Upon successful validation, you’ll get a confirmation that the user has the required permissions to continue.
Preview Results (Optional)¶
Before running QLI, you can preview the queries to be ingested.
To preview queries:
Under the section Step 3: Preview Results (optional), enter the date range for which you want to generate a preview.
Click Preview.
Click View Results to view the generated report.
You can download the preview as a JSON file. Click Download Preview Results in the preview window to download it.
Run QLI¶
You can either run QLI manually on demand or configure it to run automatically on a schedule:
Run QLI Manually¶
To run QLI on demand:
Under the section Step 4: Run QLI, specify the desired date range using the Date Range calendar widgets. You will need to specify the start date and the end date of the date range separately.
Click Import. This will initiate a QLI job.
You can monitor the status and history of your QLI jobs on the QLI Job History page. To open the QLI job history, click Go to QLI Job History.
Find more details about the QLI job history in View the QLI Job Status.
Schedule QLI¶
You can configure QLI to run automatically on a schedule that you select.
To schedule QLI:
Under the section Step 3: Run QLI, enable the Enable QLI Schedule toggle.
Specify values for the job recurrence and time. The values are set in your local time. The next QLI job for your data source will run on the schedule you have specified.
View the QLI Job Status¶
On the QLI Job History sub-tab, view the status of your QLI jobs.
To open the list of QLI jobs that were run on the data source, click the QLI Job History sub-tab on top of the page. The QLI jobs are displayed in the query log ingestion job status table.
A QLI job can have one of these statuses:
Succeeded—The job was successful.
Partial Success—The job was successful, but there are some warnings. For example, if Alation fails to ingest some objects, it skips them and proceeds to ingest other objects, with the QLI job resulting in partial success.
Failed—The job failed due to errors.
For each of the QLI jobs listed in the ingestion job status table, you can view the detailed log messages. Click the status link or the View Details link to open a detailed report in a pop-up info box. If there were errors during QLI, the corresponding error messages are displayed in the Job errors table. Follow the instructions in the Hint column for troubleshooting tips.