Azure Databricks OCF Connector: Overview¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
The OCF connector for Azure Databricks was developed by Alation and is available as a Zip file that can be uploaded and installed in the Alation application. The connector is compiled together with the required database driver, so no additional effort is needed to procure and install the driver.
To download the Azure Databricks OCF connector package, go to the Alation Connector Hub available from the Customer Portal. Go to Customer Portal > Connectors > Alation Connector Hub. Only Alation users with access to the Customer Portal can access the Alation Connector Hub. If you don’t have access to the Customer Portal, contact Alation Support.
This connector should be used to catalog Azure Databricks or Azure Databricks on Azure Government Cloud as a data source on Alation on-premise and Alation Cloud Service instances. It extracts and catalogs such database objects as tables, views, and columns. After the metadata is extracted, it is represented in the data catalog as a hierarchy of catalog pages under the parent data source. Alation users can leverage the full catalog functionality to search for and find the extracted metadata, curate the corresponding catalog pages, create documentation about the data source, and exchange information about it.
Team¶
The following administrators are required to install this connector:
Alation administrator
Installs the connector.
Creates and configures the Azure Databricks data source in the catalog.
Azure Databricks administrator
Creates a service account for Alation.
Provides the JDBC URI to access metadata.
Provides access to schemas and tables to extract metadata.
Assists with configuring Query Log Ingestion (QLI).
Assists with configuring OAuth authentication for Compose.
Scope¶
The table below shows which metadata objects are extracted by this connector and which features are supported.
Feature |
Scope |
Availability |
---|---|---|
Authentication |
||
Token-based authentication |
Authentication using Databricks personal access tokens |
Yes |
Metadata extraction (MDE) |
||
Default MDE |
Extracts metadata based on default extraction queries in the connector code |
Yes |
Custom query-based MDE |
Extracts metadata based on custom extraction queries provided by user |
No |
Popularity |
Indicator of the popularity (intensity of use) of a data object, such as a table or a column |
Yes |
Extracted metadata objects |
||
Schemas |
List of schemas |
Yes |
Tables |
List of tables |
Yes |
Columns |
List of columns |
Yes |
Column comments |
Column comments |
Yes |
Column data types |
Column data types |
No |
Views |
List of views |
Yes |
Source comments |
Source comments |
Yes |
Primary keys |
Primary key information for extracted tables |
No |
Foreign keys |
Foreign key information for extracted tables |
No |
Functions |
Function metadata |
No |
Function definitions |
Function definition metadata |
No |
Sampling and Profiling |
||
Table sampling |
Extracts data samples from extracted tables |
Yes |
Column sampling |
Extracts data samples from extracted columns |
Yes |
Deep column profiling |
Profiling of specific columns with the calculation of value distribution stats |
Yes |
Dynamic profiling |
Table and column profiling by individual users who use their own database accounts to retrieve the profiles |
Yes |
Custom query-based table sampling |
Ability to use custom queries for sampling specific tables |
Yes |
Custom query-based column sampling |
Ability to use custom queries for sampling specific columns |
Yes |
Query Log Ingestion (QLI) |
||
File-based QLI |
Ingestion of query history based on log files that contains query history data |
Yes |
Table-based QLI |
Ingestion of query history based on a table that contains query history data |
Yes |
Query-based QLI |
Ingestion of query history based on a custom query |
No |
JOINs and filters |
Calculation of JOIN and filter information based on ingested query history |
Yes |
Predicates |
Ability to parse predicates in ingested queries |
Yes |
Lineage |
||
Automatic lineage generation |
Auto-calculation of lineage based on query history ingested from QLI, MDE, and Compose queries |
Yes |
Compose |
||
Customer-managed (on-premise) instances |
Compose on on-premise Alation instances |
Yes |
Alation Cloud Service instances |
Depending on your network configuration, you may be using Alation Agent to connect to your data source. Compose via Agent is supported from connector version 1.1.0.4607. |
Yes |
Personal Access Token (PAT) authentication in Compose |
Authentication in Compose with username and password |
Yes |
SSO through OAuth in Compose |
Authentication in Compose with OAuth via Azure Active Directory OAuth authentication is supported from connector version 1.0.1.2340. |
Yes |
Metastore Support¶
We have certified the AWS Databricks connector with Hive as the metastore. Please note that we do not certify external metastores, such as AWS Glue, Derby.