Databricks for Custom DB¶
Overview¶
Databricks on different platforms such as Azure and Google Cloud Platform (GCP) are supported as Custom DB data source type in Alation. It is recommended to use the Simba Spark JDBC Driver for Databricks on Azure and Databricks on GCP. For more details on the driver version, refer to the appropriate version of the Support Matrix. The Simba Spark JDBC Driver is available by default in Alation.
To set up the connection for Databricks on Azure or GCP, or , see the corresponding sections dedicated to each of the sources:
Databricks Objects to Alation Objects Mapping¶
Databricks Object
Alation Object
Notes
Cluster
Data Source
MDE is always performed against a concrete cluster, which must be running at the time.
Hive Metastore
Data Source
The Hive Metastore is required to perform MDE
Notebook
N/A
Alation does not usually catalog code other than SQL.
Schema
Schema
A collection of tables
Table / DataFrame
Table
Alation only knows about DataFrames which have been registered as Tables in the Hive Metastore
Column
Attribute
Complex data types may not be fully supported
Spark SQL Query
Query
Starting with 2020.3, QLI is supported for Spark SQL queries. Requires configuration.
Functions / procedures
Functions
Not supported
Service Account
An account specific for each user so that Compose can be used with the correct privileges, and QLI can be attributed to the right user.
DB Account
An account specific for each user, so that Compose can be used with the correct privileges, and QLI can be attributed to the right user
Troubleshooting¶
Logs to collect/review:
For logs related to MDE: taskserver.log, taskserver_err.log.
For logs related to Compose: connector.log, connector_err.log.
For any other errors: alation-error.log, alation-debug.log