Prerequisites

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Before you install and configure the OCF connector for Google BigQuery, ensure that you have performed the following:

Enable APIs

Enable the following APIs for OCF connector for Google BigQuery:

  • BigQuery API

  • Cloud Resource Manager API (Optional)

Enable BigQuery API

Enable the BigQuery API for projects you want to catalog.

Note

By default, the BigQuery API is enabled for all newly created projects. To verify, go to APIs & Services > Enabled APIs & services on Google Cloud Platform. If disabled, enable it by clicking ENABLE APIS AND SERVICES and selecting the BigQuery API.

Enable Cloud Resource Manager API

Note

This is optional.

Enable Cloud Resource Manager API for the project you configure in the JDBC URI for Alation to perform permission checks for metadata extraction and query log ingestion. However, Alation recommends enabling the Cloud Resource Manager API to complete the access check to help you identify possible permission issues to perform Metadata Extraction and Query Log Ingestion.

Configure Network Connectivity

Open outbound TCP port 443 to the Google BigQuery server.

Additionally, if you have enabled firewall, you must include the following URLs to the allowlist.

  • https://www.googleapis.com/oauth2/v1/certs

  • https://oauth2.googleapis.com/token

  • https://accounts.google.com/o/oauth2/v2/auth

  • https://www.googleapis.com/robot/v1/metadata/x509/

  • https://www.googleapis.com/auth/bigquery

  • .1e100.net

Create a Service Account

Google service accounts are special accounts that belong to your applications or virtual machines instead of individual end-users. An application uses a service account to call the Google API of a service without users being directly involved in this flow.

Alation has certified the following service account types:

  • SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com

  • PROJECT_NUMBER-compute@developer.gserviceaccount.com (Compute Engine API service account)

Create a Service Account and Generate Key

Each service account is associated with a key managed by the Google Cloud Platform (GCP) and used for service-to-service authentication in GCP.

You can either create a new service account for Alation, use an already existing service account, or use the default Compute Engine service account. If you chose to use an existing service account or the Compute Engine service account, make sure you have the account key and key file or generate new ones to use in Alation.

To create a service account:

  1. Log in to Google Cloud Platform and go to IAM & Admin > Service Accounts.

  2. Click +CREATE SERVICE ACCOUNT.

  3. Enter the required and optional information (name, ID, description) and click Create.

  4. Assign the roles bigquery.jobUser and AlationUser to the service account. Click Done.

    ../../../_images/gbq-4.png
  5. In the list of service accounts, for the service account you created, click Actions > Manage keys.

    ../../../_images/gbqnew_02.png
  6. Click ADD KEY > Create new key. Choose JSON as Key type (recommended).

    ../../../_images/gbq-7.png
  7. Click Create to generate the key. The key file will be saved to your computer.

  8. Save the name of the key and the information about the location of the key file. They will be required later during the configuration on the Alation side.

Create a User Account and an OAuth Client

You can use an OAuth client and a user account for authentication in Alation.

To authenticate with a a user account and OAuth client:

Create a new or designate an existing user account that Alation can use for metadata extraction, sampling and profiling, and QLI. Ensure that this account has permissions listed in the Grant Required Permissions section.

Note

To authenticate in Compose, run query forms, perform dynamic sampling and profiling, and upload data, each user will use their own Google BigQuery account.

Create an OAuth Client

To create an OAuth client:

  1. Log in to Google Cloud Platform and in the left-hand menu select API & Services > Credentials.

    ../../../_images/GBQ_OCF_IstallConfig_Credentials.png
  2. Click on Create Credentials and select OAuth client ID.

  3. Select Web Application in the Application type dropdown list.

  4. Specify a name.

  5. Under the Authorized redirect URIs section, click ADD URI. You will need to add two URIs to cover full data source functionality: one for extraction, sampling, profiling, and QLI; and the second for Compose, query forms, dynamic sampling and profiling, and data upload.

    Note

    If you chose to authenticate with a service account for extraction, then only add a redirect URI for Compose.

    Use the following URI formats:

    • MDE, sampling and profiling, and QLI

      Format:

      http://<hostname>/auth/callback/?method=oauth&config_name=<oauth_config_name>

      —where <oauth_config_name> is a name of a configuration profile that you will need to create in Alation later. Save the value that you add to the URI to use in Alation, for example google_oauth_conf.

      Example:

      http://my-datacatalog.com/auth/callback/?method=oauth&config_name=google_oauth_conf

    • Compose, query forms, dynamic sampling and profiling, and data upload

      Format:

      http://<hostname>/api/datasource_auth/oauth/callback

      Example:

      http://my-datacatalog.com/api/datasource_auth/oauth/callback

      ../../../_images/GBQ_OCF_IstallConfig_AddURI.png
  6. Click on Save and copy the client ID and client secret that will be generated. Save them to use later when configuring your data source in Alation.

Add Additional Configuration

You can additionally configure authentication for Compose, dynamic sampling and profiling, and data upload using a user account and an OAuth client. Use the steps in Create an OAuth Client to create an OAuth client with the Compose redirect URL only. When setting up your data source in Alation, you will need to specify two sets of parameters: for extraction through the service account and for user-initiated connections (Compose, query forms, dynamic sampling and profiling, and data upload).

Grant Required Permissions

The service or user account you want to use for extraction requires a specific set of permissions on Google BigQuery. Assign a predefined role that contains all the required permissions listed below, or create a custom role AlationUser and assign the permissions.

Important

We recommend that you apply all the permissions at the Project level.

Note

  • Storage buckets must have the Storage Object Viewer permission to extract external tables.

  • If the project has the Google Storage API enabled, add more permissions (see below).

Grant Permissions for Metadata Extraction

Required Permissions

Purpose

bigquery.datasets.get

Retrieves dataset metadata

bigquery.tables.get

Retrieves table metadata

bigquery.tables.list

Lists tables and metadata on tables

resourcemanager.projects.get

Retrieves project names and metadata

Note

  • Alation performs the permission checks only at the Project level and not at all resource levels.

  • A successful access check mostly ensures a successful QLI. However, if the QLI fails even after the access checks are completed successfully, check the job history tables for error details.

Grant Permissions for Query Log Ingestion

Required Permissions

Purpose

bigquery.jobs.list

Lists all jobs and retrieves metadata on any job submitted by a user.

bigquery.jobs.listAll

Allows QLI preview and extraction for any user.

resourcemanager.projects.get

Retrieves project names and metadata.

Note

  • Alation performs the permission checks only at the Project level and not at all resource levels.

  • A successful access check mostly ensures a successful MDE. However, if the MDE fails even after the access checks are completed successfully, check the job history tables for error details.

Grant Permissions for QLI Volume Check (Optional)

Required Permissions

Purpose

bigquery.jobs.listAll

Fetches all queries for a project.

resourcemanager.projects.get

Retrieves project names and metadata.

bigquery.jobs.create

Runs jobs (including queries) within the project.

bigquery.datasets.get

Retrieves dataset metadata.

Note

  • Alation performs the volume check to discover the approximate size of the query history metadata. This helps in determining the QLI run frequency. The size is estimated based on the query volume of the last 7 days.

  • QLI volume check is limited to run for five minutes or till the daily average volume reaches 500K, whichever occurs first.

  • If the daily average volume is more than 500K, Alation recommends that you schedule to run the QLI job daily.

Grant Permissions for Sampling and Profiling

Required Permissions

Description

bigquery.tables.get

Retrieves table metadata.

bigquery.tables.getData

Fetches actual data present in the table.

bigquery.tables.list

Lists tables and metadata on tables

bigquery.jobs.listAll

Lists all jobs and retrieves metadata of any job submitted by any user.

bigquery.jobs.create

Runs jobs (including queries) within the project.

Note

The bigquery.tables.getData and bigquery.jobs.create permissions are optional for a service account. You can skip these permissions to avoid data security issues, if any. However, these permissions are required to perform Sampling using service and user accounts. Also, these permissions (bigquery.tables.getData and bigquery.jobs.create) are required for user accounts to enable the dynamic sampling option to avail table sampling.

Grant Permissions for External Table Extraction

Required Permissions

Description

bigquery.readsessions.create

Creates a session to stream large results.

Grant Permissions for Projects with Google Storage API Enabled

Required Permissions

Description

bigquery.readsessions.create

Creates a session to stream large results.

bigquery.readsessions.getData

Retrieves data from the session.

bigquery.readsessions.update

Cancels the session.

Create an OAuth Configuration for Extraction

Authentication with an OAuth client and a user account for extraction requires creating an OAuth configuration object for Alation AuthService.

To create the configuration object, use the steps in Authentication Configuration Methods for External Systems and the following information:

  • Config Name—Use the same value as in the Authorized redirect URI in the OAuth client settings. See Create a User Account and an OAuth Client.

  • Client Id—Use the client ID of your OAuth client.

  • Client Secret—Use the client secret of your OAuth client.

  • Scope—Use value https://www.googleapis.com/auth/bigquery, https://www.googleapis.com/auth/cloud-platform.read-only

  • Subject (Optional)—Leave blank.

  • Token Buffer time—Set in minutes, for example: 10.

  • Grant Type—Leave the default value. This parameter does not apply to this connector and will not be used.

  • PKCE Verifier—Leave the default value. This parameter does not apply to this connector and will not be used.

  • Authorize Endpoint URL—Use value https://accounts.google.com/o/oauth2/v2/auth?access_type=offline

  • Redirect URL—Use format https://<your_Alation_host>/auth/callback/?method=oauth&config_name=<config_name>. Ensure that HTTPS is configured for your Alation instance, see Configure HTTPS for details.

  • Token Endpoint URL—Use value https://oauth2.googleapis.com/token

  • User Info Endpoint URL— Provide the User Info Endpoint URL for the identity provider.

After creating the authentication object, configure the settings of your data source.

Note

Authentication with an OAuth client and a user account for Compose and other features that require user-initiated connections does not require an OAuth configuration object. It will need to be configured separately on the Compose tab of the data source settings.