Hive Overview

Hive databases can be deployed on different platforms with various execution engines in the middle layer, and as a result, there can exist numerous Hive configurations. Alation is working toward supporting any Hive setup with as little configuration effort on the user’s part as possible.

How you can connect to a Hive data source in Alation, depends on which Alation release you are using.

In releases before V R5 (5.9.x), you can configure your Hive data sources by providing connection parameters manually in Alation UI.

Starting with release V R5 (5.9.x), you can connect to a Hive database by uploading its client configuration files. Configuration upload-based Hive connection is a Labs/Feature Configuration feature and must be enabled through the dedicated feature switch in Admin Settings > Labs/Feature Configuration. Configuration upload-based Hive framework specifically focuses on Query Log Ingestion (QLI) for Hive, allowing Alation to support QLI for a bigger number of various Hive setups.

Note

The Hive documentation refers to the configuration-based Hive framework, available from release V R5, as “Hive by configurations upload” or “configuration-based Hive”. The Hive framework that is available by default and existed before V R5 is referred to as the “default Hive framework” or “default Hive”.

If you enable Hive by configuration upload (V R5+), all existing Hive sources previously added using the default framework will remain fully functional. Only the new Hive sources you add will require the configurations to be uploaded. You can choose to migrate your existing default Hive sources to the new Hive framework after you have enabled this feature.

Default Vs. Configuration-Based Hive Frameworks

The default Hive support covers:

Component

CDH Support

HDP Support

EMR Support

MapR Support

Simple Authentication

Y

Y

Y

Y

Kerberos

Y

Y

Y

N

Azure Encryption

Not applicable

N (Azure HDP)

Not applicable

Not applicable

Kerberos Knox

Not applicable

MDE but no QLI

Not applicable

Not applicable

LDAP Knox

Not applicable

MDE but no QLI

Not applicable

Not applicable

SSL

Y

Y

Y

N

WebHDFS SSL

Y

Y

Y

Not applicable

HttpFS Kerberos

Not applicable

Not applicable

Not applicable

N

Wire-level Security

Not applicable

Not applicable

Not applicable

N

MapR SASL

Not applicable

Not applicable

Not applicable

N

With the configuration-based Hive, the level of support increases to:

Component

CDH Support

HDP Support

EMR Support

MapR Support

Simple Authentication

Y

Y

Y

Y

Kerberos

Y

Y

Y

Y

Azure Encryption

Not applicable

Y (Azure HDP)

Not applicable

Not applicable

Kerberos Knox

Not Knox

Y (MDE and QLI)

Not applicable

Not applicable

LDAP Knox

Not applicable

Y (MDE and QLI)

Not applicable

Not applicable

SSL

Y

Y

Y

Y

WebHDFS SSL

Y

Y

Y

Not applicable

HttpFS Kerberos

Not applicable

Not applicable

Not applicable

Y

Wire-level Security

Not applicable

Not applicable

Not applicable

Y

MapR SASL

Not applicable

Not applicable

Not applicable

Y

Limitations

The Compose tool does not support Keytab or pre-cached Kerberos ticket-based authentication.

Which Hive Framework Should you Use?

Analyze your Hive setup. If your Hive database setup is supported with the configuration-based Hive framework only (for example,HDP 3 + Hive 3 + Tez) but cannot be supported by the default Hive, the Hive by configuration upload is the only path to choose.

If your Hive setup can be supported by both, default and configuration-based frameworks, note that the configuration-based Hive has several advantages configuration-wise:

  • It requires fewer parameters to be provided in Alation UI. For example, you will not need to find out from your Hadoop admin and manually type such information as the Metastore URI, log storage paths, and WebHDFS credentials and endpoints for QLI. This information will be obtained by Alation automatically when parsing the uploaded Hive configuration files.

  • The configuration-based framework is designed to fix any Hive cataloging problems that the old framework did not anticipate, so it significantly increases the chances of your Hive source working on the first try.

Important

Several  Hive setups only use the default Hive framework for QLI. These are:

  • SparkSql over Hive Metastore (on both CDH and HDP)

  • Hive on EMR with QLI over Amazon S3

  • Hive with AWS Glue as Metastore

If you have such sources, do NOT enable configuration-based Hive.