HDFS OCF Connector: Overview

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Available from release 2022.3.2

The OCF connector for HDFS is developed by Alation and available on demand as a Zip file for you to upload and install in the Alation application.

The latest HDFS OCF connector package is available on the Connector Hub. Ask an Alation admin with access to the Customer Portal to download the connector from the Connectors section of the Portal (Customer Portal > Connectors > Alation Connector Hub).

Use the OCF connector for HDFS to catalog HDFS as a file system source in Alation. It extracts HDFS objects such as root folders along with their content, which includes files and folders. It enables catalog users to discover, search, browse, and curate HDFS objects, such as files and folders, from the Alation user interface. The HDFS OCF connector can be used to catalog metadata from the HDFS file system deployed on the Cloudera Distributed Hadoop (CDH) or Cloudera Data Platform (CDP) platforms.

Team

The following administrators are required to install this connector:

  • Alation Server Admin:

    • Validates the availability of Alation Connector Manager or installs it

    • Installs the connector

    • Adds and configures the HDFS file system source in Alation

  • Kerberos Server Administrator:

    • Provides the Kerberos configuration file (krb5.conf)

Scope

The table below lists the features supported by the connector.

Feature

Scope

Availability

Authentication

Basic (username)

Authentication with a service account.

Yes

SSL

Connection over the TLS protocol.

Yes

Kerberos

Authentication with the Kerberos protocol.

Yes

LDAP

Authentication with the LDAP protocol.

No

OAuth

Authentication with the OAuth 2.0 protocol.

No

SSO

Authentication using an SSO flow through an IdP application.

No

Metadata extraction (MDE)

Metadata Extraction uses the WebHDFS REST API. The HDFS OCF connector calls the List a Directory API to get the list of files and folders.

Yes

Extracted metadata objects

Files

Files on the HDFS server.

Yes

Folders

Folders on the HDFS server.

Yes

Attributes or columns

Table attributes or columns within a file or folder on the HDFS server.

Yes

Object permissions

Information related to object permissions.

Yes

Object owner

Information related to the object owner.

Yes

Object group

Object owner group name.

Yes

Lineage

Automatic lineage generation

Auto-calculation of lineage based on query history ingested from MDE queries.

Yes

Direct lineage

Extraction of lineage from system tables during MDE.

Yes

Column-level lineage

Extraction of lineage on the column level

No

Object Mapping

File System Object

Alation Object

File system

File system source.

Folder

Directory or File.

Note

During ingestion, Folder is considered as File and during search, Folder is considered as Directory.

File

File