Starburst Trino OCF Connector: Install and Configure¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Network Configuration¶
Open the inbound TCP port that corresponds with the port you have configured in Starburst Trino.
For example, 8080 for an HTTP connection, 443 for TLS-enabled clusters, or 7778 if the database uses Kerberos authentication.
SSL Certificate¶
To configure the connection over SSL, obtain the SSL certificate file for your Starburst Trino server. You will need to upload the SSL certificate when configuring your data source in Alation.
Service Account¶
Create a user with Starburst Trino:
Create a user for the Alation service account.
Create a role for the Alation service account:
CREATE ROLE <alation_role>;Grant the role to the user:
GRANT <alation_role> TO USER <alation>;
Permissions for Metadata Extraction¶
Grant the role <alation_role>
permissions for these system schemas:
system.jdbc.schemas
system.jdbc.catalogs
system.jdbc.table_types
system.jdbc.tables
system.jdbc.columns
SQL examples¶
GRANT SELECT TO <alation_role>;
GRANT USAGE ON SCHEMA system.jdbc.schemas TO <alation_role>;
GRANT USAGE ON SCHEMA system.jdbc.catalogs TO <alation_role>;
GRANT USAGE ON SCHEMA system.jdbc.table_types TO <alation_role>;
GRANT USAGE ON SCHEMA system.jdbc.tables TO <alation_role>;
GRANT USAGE ON SCHEMA system.jdbc.columns TO <alation_role>;
Permissions for Profiling¶
To perform table and column sampling and column profiling, the service account needs the SELECT permissions on the tables that need to be profiled, for example:
GRANT SELECT ON <database>.<schema_name>.<table_name> TO ROLE <alation_role>;
Permissions for Query Log Ingestion¶
Permissions for Query Log Ingestion (QLI) depend on the type of QLI you will be configuring:
For table-based QLI, the service account must have the SELECT permission for the QLI view.
For custom query-based QLI, the service account must have the SELECT permission for the
system.runtime.queries
table, with GRANT OPTION:GRANT SELECT ON system. runtime.queries TO ROLE <alation_role> WITH GRANT OPTION;
Find more information about QLI, see Query Log Ingestion.
JDBC URI¶
The JDBC URI can include a specific catalog name. If it’s included, then Alation will only extract schemas from that specific catalog. If the URI does not include a catalog name, Alation will extract schemas from all catalogs.
Note
The ability to extract schemas from a specific catalog is available from connector version 1.2.6.
Format¶
Extract Schemas from All Catalogs¶
jdbc:trino://host:port
Example
trino://ip-10-13-82-39.alation-test.com:8080
Extract Schemas from One Catalog¶
jdbc:trino://host:port/catalog_name
Example
trino://ip-10-13-82-39.alation-test.com:7778/hive
Authentication¶
The OCF connector for Starburst Trino supports basic authentication and authentication via Kerberos.
Basic Authentication¶
Basic authentication requires the username and password of the Service Account you created for Alation.
Kerberos Authentication¶
If your database is kerberized, prepare the following information and files:
Username and password of the service account
Kerberos configuration file (krb5.conf)
The keytab file if keytabs are used
Trino coordinator Kerberos service name
Kerberos Principal
Connection Over SSL¶
The OCF connector for Starburst Trino supports SSL connections. You will need to obtain the SSL certificate for your database and upload it in the settings of the data source you create in Alation (see Application Settings below).
Connector versions older than 1.2.4 only support the .cer certificate format. Convert the .jks file into .cer before uploading in Alation.
Connector versions 1.2.4 and newer, support the .jks format, but not .cer.
Configuration in Alation¶
STEP 1: Install the Connector¶
Alation On-Premise¶
Important
Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.
To install an OCF connector:
If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.
Ensure that the OCF connector Zip file is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Alation Cloud Service¶
Note
On Alation Cloud Service instances, Alation Connector Manager is available by default.
Depending on your network configuration, you may need to use Alation Agent to connect to databases.
Connection via Alation Agent¶
Ensure that Alation Agent is enabled on your Alation instance. If necessary, create a Support ticket with Alation for an Alation representative to enable the Alation Agent feature on your instance and to receive the Alation Agent installer.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Connection Without Agent¶
To install an OCF connector:
Ensure that the OCF connector Zip file is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
STEP 2: Create and Configure a New Data Source¶
In Alation, add a new data source:
Log in to Alation as a Server Admin.
Expand the Apps menu on the right of the main toolbar and select Sources.
On the Sources page, click +Add on the top right of the page and in the list that opens, click Data Source. This will open the Add a Data Source wizard.
On the first screen of the wizard, specify a name for your data source, assign additional Data Source Admins, if necessary, and click the Continue Setup button on the bottom. The Add a Data Source screen will open.
On the Add a Data Source screen, the only field you should populate is Database Type. From the Database Type dropdown, select the connector name. After that you will be navigated to the Settings page of your new data source.
Note
Agent-based connectors will have the Agent name appended to the connector name.
The name of this connector is Starburst Trino OCF connector.
Access¶
On the Access tab, set the data source visibility using these options:
Public Data Source—The data source will be visible to all users of the catalog.
Private Data Source—The data source will be visible to the users allowed access to the data source by Data Source Admins.
You can add new Data Source Admin users in the Data Source Admins section.
General Settings¶
Note
This section describes configuring settings for credentials and connection information stored in the Alation database. If your organization has configured Azure KeyVault or AWS Secrets Manager to hold such information, the user interface for the General Settings page will change to include the following icons to the right of most options:
By default, the database icon is selected, as shown. In the vault case, instead of the actual credential information, you enter the ID of the secret. See Configure Secrets for OCF Connector Settings for details.
Perform the configuration on the General Settings tab.
Application Settings¶
Specify Application Settings if applicable. Click Save to save the changes after providing the information.
Parameter |
Description |
---|---|
BI Connection Info |
This parameter is used to generate lineage between the current data source and another source in the catalog, for example a BI source that retrieves data from the underlying database. The parameter accepts host and port information of the corresponding BI data source connection. Use the following format: You can provide multiple values as a comma-separated list:
Find more details in BI Connection Info. |
Disable Automatic Lineage Generation |
Select this checkbox to disable automatic lineage generation from QLI, MDE, and Compose queries. By default, automatic lineage generation is enabled. |
Connector Settings¶
Populate the data source connection information and save the values by clicking Save in this section.
Data Source Connection¶
Populate the data source connection information and save the values by clicking Save in this section.
Parameter |
Description |
---|---|
JDBC URI |
Specify the JDBC URI in the required format. |
Username |
Specify the service account username. |
Password |
Specify the service account password. |
Enable Kerberos authentication |
Select this checkbox if using Kerberos authentication and upload the krb5.conf file using the upload link under the checkbox. |
Use keytab |
Select this checkbox if using keytabs and upload the keytab file for the service account using the upload link under the checkbox. |
Trino coordinator Kerberos service name |
If using Kerberos authentication, provide the name of your Trino coordinator service. |
Enable SSL |
Enable or disable SSL authentication by selecting or clearing the Enable SSL checkbox. If the Enable SSL checkbox is enabled, upload the SSL certificate using the upload link below. |
Truststore password |
Specify the password for the SSL certificate. The password will be deleted if the data source connection is deleted. |
Logging Configuration¶
Select the logging level for the connector logs and save the values by clicking Save in this section. The available log levels are based on the Log4j framework.
Parameter |
Description |
---|---|
Log level |
Select the log level to generate logs. The available options are INFO, DEBUG, WARN, TRACE, ERROR, FATAL, ALL. |
You can view the connector logs in Admin Settings > Manage Connectors > Starburst Trino OCF connector.
Obfuscate Literals¶
Obfuscate Literals—Enable this toggle to hide the details of the queries in the catalog page that are ingested via QLI or executed in Compose. This toggle is disabled by default.
Test Connection¶
After specifying the connector settings, under Test Connection, click Test to validate network connectivity.
Metadata Extraction¶
You can configure metadata extraction (MDE) for an OCF data source on the Metadata Extraction tab of the Settings page. Refer to Configure Metadata Extraction for OCF Data Sources for information about the available configuration options.
Compose¶
For details about configuring the Compose tab of the Settings, refer to Configure Compose for OCF Data Sources.
Sampling and Profiling¶
Alation supports a number of ways to retrieve data samples and column profiles. For details, see Configure Sampling and Profiling for OCF Data Sources.
Column Profiling Queries for Starburst¶
Alation uses these default queries for deep column profiling of the Starburst data source. You can customize them to better suit your needs.
If using these examples, provide a value instead of the placeholder <limit>
.
Numeric Data Types¶
SELECT
MIN({column_name}) AS Minimum,
MAX({column_name}) AS Maximum,
AVG({column_name}) AS Average,
APPROX_PERCENTILE(CAST({column_name} AS double),0.5) AS Median,
SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) AS "#NULL",
(100.0 * (COUNT(*) - COUNT({column_name})) / NULLIF (COUNT(*),0)) AS "%NULL"
FROM {table_name} LIMIT <limit>
Non-Numeric Data Types¶
SELECT
SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) AS "#NULL",
(SUM(CASE WHEN {column_name} IS NULL THEN 1 ELSE 0 END) / COUNT(*)) * 100 AS "%NULL"
FROM {table_name} LIMIT <limit>
Query Log Ingestion¶
A Starburst Trino OCF data source supports table-based or query-based QLI:
Table-Based QLI—You will need to create a view on top of the system table that stores query history. Alation will query the view to retrieve query information into the catalog.
Custom query-based QLI—If you cannot create the QLI view due to access restrictions or other reasons, you can bring in query history by querying the system table directly.
Table-Based QLI¶
Create the QLI View¶
On Starburst Trino, create a view on top of the system table system.runtime.queries
to store query history. Grant the Alation service account the SELECT permissions for this view.
Note
If the JDBC URI you provided in Alation includes a catalog name, then the connector will apply the catalog filter when querying the view and only retrieve query history for the source that equals the catalog name in the URI.
CREATE OR REPLACE VIEW <view_name> AS SELECT user, query, source, created, started, "end", query_id, state FROM system.runtime.queries WHERE state = 'FINISHED';
Configure Table-Based QLI in Alation¶
To configure table-based QLI:
Go to the Query Log Ingestion tab of the Settings page of your OCF Starburst Trino data source.
Under Connector Settings > Query Extraction, specify the name of the QLI view in the Table Name field.
Important
Use the format
schema_name.view_name
.Click Save.
Custom Query-Based QLI¶
If you cannot create the view for QLI due to access restrictions, you can use a custom QLI query to extract query history into Alation. The template for the QLI query is given below. You can customize the template by adding, removing, or changing the filter, but the columns and their aliases must remain as is since Alation expects this query structure.
To use query-based QLI, ensure that the service account has the SELECT permission with GRANT OPTION for the system table system.runtime.queries
. See an example in Permissions for Query Log Ingestion.
QLI Query Template¶
When using the QLI query template, do not substitute the STARTTIME
and ENDTIME
parameters in the WHERE filter. These parameters are not actual column names and should stay as is. They are expected by the connector and will be substituted with the start and end date of the QLI range selected in the user interface when QLI is run manually or on schedule.
SELECT user AS userName, query AS queryString, source AS defaultDatabases, False AS sessionId, created AS sessionStartTime, started AS startTime, False AS cancelled, date_diff('second',"started", "end") AS seconds, query_id AS sequence FROM system.runtime.queries WHERE state ='FINISHED' AND started BETWEEN timestamp STARTTIME AND timestamp ENDTIME;
Configure Query-Based QLI in Alation¶
To configure query-based QLI:
Go to the Query Log Ingestion tab of the Settings page of your OCF Starburst Trino data source.
Under Connector Settings > Query Extraction, specify the QLI query in the Custom QLI Query field.
Click Save.
Run QLI Manually¶
To perform QLI manually on demand:
On the Query Log Ingestion tab of the Settings page of your OCF data source, under Automated and Manual Query Log Ingestion, ensure that the Enable Automated Query Log Ingestion toggle is disabled.
Specify the desired date range using the Date Range calendar widgets. The start and end dates are specified separately.
Click the Preview button to preview the queries that will be ingested. This will run a preview job that will fetch a subset of the queries based on your QLI configuration. The result will be displayed as a table.
Click the Import button to run QLI manually. This will run a query log ingestion job.
View Job Status¶
The status of preview or QLI jobs will be logged in the Query Log Ingestion Job Status table at the bottom of the page. In this table, click the status link or the View Details link for a job to view the details on the progress of this job.
Schedule QLI¶
To schedule QLI :
On the Query Log Ingestion tab, under Automated and Manual Query Log Ingestion, enable the Enable Automated Query Log Ingestion toggle.
In the Automated Query Log Ingestion Time panel that will appear, specify values for the job recurrence and time. The values are set in your local time.
Note
The hourly schedule for automated QLI is not supported.
The next QLI will run on the schedule you set.
Lineage¶
Lineage information will be generated automatically during query log ingestion (QLI) and when users run Data Definition Language queries in Compose, such as CREATE VIEW or CREATE OR REPLACE VIEW.
Note
View SQL is currently not extracted during metadata extraction.
Troubleshooting¶
Refer to Troubleshooting.