Configure Connection to Data Source¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
After you install the Databricks on Google Cloud OCF connector, you must configure the connection to the Databricks on Google Cloud data source.
The various steps involved in configuring the Databricks on Google Cloud data source connection settings are:
Provide Access¶
Go to the Access tab on the Settings page of your Databricks on Google Cloud OCF data source, set the data source visibility using these options:
Public Data Source — The data source is visible to all users of the catalog.
Private Data Source — The data source is visible to the users allowed access to the data source by Data Source Admins.
You can add new Data Source Admin users in the Data Source Admins section.
Connect to Data Source¶
To connect to the data source, you must perform these steps:
Provide the JDBC URI¶
If you are using a Databricks cluster or SQL warehouse, get the JDBC URI as documented in Connection Details.
The JDBC URI string you provide in Alation depends on the connector version:
Newer versions 2.0.0 and newer use the Databricks JDBC driver.
Older versions below version 2.0.0 use the JDBC Spark driver.
When specifying the JDBC URI in Alation, remove the jdbc:
prefix.
Note
The property
UseNativeQuery=0
is required for custom query-based sampling and profiling. Without this property in the JDBC URI, custom query-based sampling or profiling will fail. If you are not using custom query-based sampling and profiling in your implementation of this data source type, you can omit this property from the JDBC URI string.
Connection String for Databricks JDBC Driver¶
Format¶
databricks://<hostname>:443/default;transportMode=http;ssl=1;httpPath=<databricks_http_path_prefix>/<databricks_cluster_id>;AuthMech=3;UseNativeQuery=0
Example¶
databricks://9127430235515120.0.gcp.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/7298139425519230/2314-341457-f6bou5cr;AuthMech=3;UseNativeQuery=0
Connection String for Spark JDBC Driver¶
Find more information in JDBC Spark driver in Databricks documentation.
Format¶
spark://<hostname>:443/default;transportMode=http;ssl=1;httpPath=<HTTP_Path>;AuthMech=3;UseNativeQuery=0
Example¶
spark://9127430235515120.0.gcp.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/7298139425519230/2314-341457-f6bou5cr;AuthMech=3;UseNativeQuery=0
Important
If you haven’t enabled Hive Metastore, you must include the ConnCatalog
and ConnSchema
parameters into the connection JDBC URI:
ConnCatalog
- Specifies the catalog which contains the metadata schema.
ConnSchema
- Specifies the schema inside the catalog in which metadata is stored.
Example: ConnCatalog=en_dlake_cat;ConnSchema=data_governance;
Provide the JDBC URI in Alation¶
To provide the JDBC URI in the Alation UI, perform these steps:
On the Settings page of your Databricks on Google Cloud OCF data source, go to the General Settings tab.
Go to the Connector Settings > Datasource Connection section and enter the JDBC URI.
Follow the correct JDBC URI format. For details, see Connection String for Databricks JDBC Driver and Connection String for Spark JDBC Driver.
Click Save.
Configure Authentication¶
The connector supports Token-Based Authentication.
Generate a personal access token as documented in Manage Personal Access Token.
Configure Token-Based Authentication¶
To configure token-based authentication in the Alation UI, perform these steps:
On the Settings page of your Databricks on Google Cloud OCF data source, click on the General Settings tab.
Go to the Connector Settings > Datasource Connection section and provide the following information:
Parameter
Description
Username
Specify the token value for the service account.
Password
Paste the personal access token for the service account.
Click Save.
Configure Additional Connection Settings¶
Apart from the mandatory configurations that you perform to connect to the data source on the General Settings tab, configure the following additional settings:
Configure Additional Data Source Connections¶
Alation can associate objects in a data source with objects in another source in the catalog through lineage. For example, you can show lineage between your data source and BI sources that use its data.
Provide additional connection information for the data source to see lineage across multiple sources on the Lineage chart.
On the Application Settings section of General Settings tab, provide the host and port information in the Additional data source connection field.
This parameter is used to generate lineage between the current data source and another source in the catalog, for example a BI source that retrieves data from the underlying database. The parameter accepts host and port information of the corresponding BI data source connection.
For more details, see Configure Cross-Source Lineage.
Enable or Disable Automatic Lineage Generation¶
You can enable or disable the lineage for the data source to be generated automatically during metadata extraction and from Data Definition Language queries run by users in Compose.
Go to General Settings > Application Settings of the Settings page of your Databricks on Google Cloud OCF data source and enable or disable the Disable automatic lineage generation toggle.
Clear the Disable automatic lineage generation toggle when you want to automatically generate the lineage.
Select this option when you do not want lineage to be automatically generated and prefer to create lineage manually or using an API.
By default, automatic lineage generation is enabled.
Disable Obfuscate Literals¶
Turn on the Obfuscate Literals toggle to hide actual values in the query statements that are ingested during query log ingestion or executed in Compose.
By default, this option is disabled.
Configure Logging¶
To set the logging level for your Databricks on Google Cloud OCF data source logs, perform these steps:
On the Settings page of your data source, go to Logging Configuration section in the General Settings tab.
Select a logging level for the connector logs and click Save.
The available log levels are based on the Log4j framework.
You can view the connector logs in Admin Settings > Server Admin > Manage Connectors > GCPDatabricks OCF Connector.
Test the Connection¶
The connection test checks database connectivity. Alation uses the JDBC URI to connect to the database and to confirm when the connection is established.
After specifying the JDBC URI and configuring authentication, test the connection.
To validate the network connectivity, go to General Settings > Test Connection of the Settings page of your Databricks on Google Cloud OCF data source and click Test.
A dialog box appears confirming the status of the connection test.