Prerequisites¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Before you install and configure the OCF connector for Google BigQuery, ensure that you have performed the following:
Enable APIs¶
Enable the following APIs for OCF connector for Google BigQuery:
BigQuery API
Cloud Resource Manager API (Optional)
Enable BigQuery API¶
Enable the BigQuery API for projects you want to catalog.
Note
By default, the BigQuery API is enabled for all newly created projects. To verify, go to APIs & Services > Enabled APIs & services on Google Cloud Platform. If disabled, enable it by clicking ENABLE APIS AND SERVICES and selecting the BigQuery API.
Enable Cloud Resource Manager API¶
Note
This is optional.
Enable Cloud Resource Manager API for the project you configure in the JDBC URI for Alation to perform permission checks for metadata extraction and query log ingestion. However, Alation recommends enabling the Cloud Resource Manager API to complete the access check to help you identify possible permission issues to perform Metadata Extraction and Query Log Ingestion.
Configure Network Connectivity¶
Open outbound TCP port 443 to the Google BigQuery server.
Additionally, if you have enabled firewall, you must include the following URLs to the allowlist.
https://www.googleapis.com/oauth2/v1/certs
https://oauth2.googleapis.com/token
https://accounts.google.com/o/oauth2/v2/auth
https://www.googleapis.com/robot/v1/metadata/x509/
https://www.googleapis.com/auth/bigquery
.1e100.net
Create a Service Account¶
Google service accounts are special accounts that belong to your applications or virtual machines instead of individual end-users. An application uses a service account to call the Google API of a service without users being directly involved in this flow.
Alation has certified the following service account types:
SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
PROJECT_NUMBER-compute@developer.gserviceaccount.com (Compute Engine API service account)
Create a Service Account and Generate Key¶
Each service account is associated with a key managed by the Google Cloud Platform (GCP) and used for service-to-service authentication in GCP.
You can either create a new service account for Alation, use an already existing service account, or use the default Compute Engine service account. If you chose to use an existing service account or the Compute Engine service account, make sure you have the account key and key file or generate new ones to use in Alation.
To create a service account:
Log in to Google Cloud Platform and go to IAM & Admin > Service Accounts.
Click +CREATE SERVICE ACCOUNT.
Enter the required and optional information (name, ID, description) and click Create.
Assign the roles
bigquery.jobUser
andAlationUser
to the service account. Click Done.In the list of service accounts, for the service account you created, click Actions > Manage keys.
Click ADD KEY > Create new key. Choose JSON as Key type (recommended).
Click Create to generate the key. The key file will be saved to your computer.
Save the name of the key and the information about the location of the key file. They will be required later during the configuration on the Alation side.
Create a User Account and an OAuth Client¶
You can use an OAuth client and a user account for authentication in Alation.
To authenticate with a a user account and OAuth client:
Create a new or designate an existing user account that Alation can use for metadata extraction, sampling and profiling, and QLI. Ensure that this account has permissions listed in the Grant Required Permissions section.
Note
To authenticate in Compose, run query forms, perform dynamic sampling and profiling, and upload data, each user will use their own Google BigQuery account.
Create an OAuth Client¶
To create an OAuth client:
Log in to Google Cloud Platform and in the left-hand menu select API & Services > Credentials.
Click on Create Credentials and select OAuth client ID.
Select Web Application in the Application type dropdown list.
Specify a name.
Under the Authorized redirect URIs section, click ADD URI. You will need to add two URIs to cover full data source functionality: one for extraction, sampling, profiling, and QLI; and the second for Compose, query forms, dynamic sampling and profiling, and data upload.
Note
If you chose to authenticate with a service account for extraction, then only add a redirect URI for Compose.
Use the following URI formats:
MDE, sampling and profiling, and QLI
Format:
http://<hostname>/auth/callback/?method=oauth&config_name=<oauth_config_name>
—where
<oauth_config_name>
is a name of a configuration profile that you will need to create in Alation later. Save the value that you add to the URI to use in Alation, for examplegoogle_oauth_conf
.Example:
http://my-datacatalog.com/auth/callback/?method=oauth&config_name=google_oauth_conf
Compose, query forms, dynamic sampling and profiling, and data upload
Format:
Click on Save and copy the client ID and client secret that will be generated. Save them to use later when configuring your data source in Alation.
Add Additional Configuration¶
You can additionally configure authentication for Compose, dynamic sampling and profiling, and data upload using a user account and an OAuth client. Use the steps in Create an OAuth Client to create an OAuth client with the Compose redirect URL only. When setting up your data source in Alation, you will need to specify two sets of parameters: for extraction through the service account and for user-initiated connections (Compose, query forms, dynamic sampling and profiling, and data upload).
Grant Required Permissions¶
The service or user account you want to use for extraction requires a specific set of permissions on Google BigQuery. Assign a predefined role that contains all the required permissions listed below, or create a custom role AlationUser
and assign the permissions.
Important
We recommend that you apply all the permissions at the Project level.
Note
Storage buckets must have the Storage Object Viewer permission to extract external tables.
If the project has the Google Storage API enabled, add more permissions (see below).
Grant Permissions for Metadata Extraction¶
Required Permissions |
Purpose |
---|---|
|
Retrieves dataset metadata |
|
Retrieves table metadata |
|
Lists tables and metadata on tables |
|
Retrieves project names and metadata |
Note
Alation performs the permission checks only at the Project level and not at all resource levels.
A successful access check mostly ensures a successful QLI. However, if the QLI fails even after the access checks are completed successfully, check the job history tables for error details.
Grant Permissions for Query Log Ingestion¶
Required Permissions |
Purpose |
---|---|
|
Lists all jobs and retrieves metadata on any job submitted by a user. |
|
Allows QLI preview and extraction for any user. |
|
Retrieves project names and metadata. |
Note
Alation performs the permission checks only at the Project level and not at all resource levels.
A successful access check mostly ensures a successful MDE. However, if the MDE fails even after the access checks are completed successfully, check the job history tables for error details.
Grant Permissions for QLI Volume Check (Optional)¶
Required Permissions |
Purpose |
---|---|
|
Fetches all queries for a project. |
|
Retrieves project names and metadata. |
|
Runs jobs (including queries) within the project. |
|
Retrieves dataset metadata. |
Note
Alation performs the volume check to discover the approximate size of the query history metadata. This helps in determining the QLI run frequency. The size is estimated based on the query volume of the last 7 days.
QLI volume check is limited to run for five minutes or till the daily average volume reaches 500K, whichever occurs first.
If the daily average volume is more than 500K, Alation recommends that you schedule to run the QLI job daily.
Grant Permissions for Sampling and Profiling¶
Required Permissions |
Description |
---|---|
|
Retrieves table metadata. |
|
Fetches actual data present in the table. |
|
Lists tables and metadata on tables |
|
Lists all jobs and retrieves metadata of any job submitted by any user. |
|
Runs jobs (including queries) within the project. |
Note
The
bigquery.tables.getData
andbigquery.jobs.create
permissions are optional for a service account. You can skip these permissions to avoid data security issues, if any. However, these permissions are required to perform Sampling using service and user accounts. Also, these permissions (bigquery.tables.getData
andbigquery.jobs.create
) are required for user accounts to enable the dynamic sampling option to avail table sampling.
Grant Permissions for External Table Extraction¶
Required Permissions |
Description |
---|---|
|
Creates a session to stream large results. |
Grant Permissions for Projects with Google Storage API Enabled¶
Required Permissions |
Description |
---|---|
|
Creates a session to stream large results. |
|
Retrieves data from the session. |
|
Cancels the session. |
Create an OAuth Configuration for Extraction¶
Authentication with an OAuth client and a user account for extraction requires creating an OAuth configuration object for Alation AuthService.
To create the configuration object, use the steps in Authentication Configuration Methods for External Systems and the following information:
Config Name — Use the same value as in the Authorized redirect URI in the OAuth client settings. See Create a User Account and an OAuth Client.
Client Id — Use the client ID of your OAuth client.
Client Secret — Use the client secret of your OAuth client.
Scope — Use value
https://www.googleapis.com/auth/bigquery
,https://www.googleapis.com/auth/cloud-platform.read-only
Subject (Optional) — Leave blank.
Token Buffer time — Set in minutes, for example:
10
.Grant Type — Leave the default value. This parameter does not apply to this connector and will not be used.
PKCE Verifier — Leave the default value. This parameter does not apply to this connector and will not be used.
Authorize Endpoint URL — Use value
https://accounts.google.com/o/oauth2/v2/auth?access_type=offline
Redirect URL — Use format
https://<your_Alation_host>/auth/callback/?method=oauth&config_name=<config_name>
. Ensure that HTTPS is configured for your Alation instance, see Configure HTTPS for details.Token Endpoint URL — Use value
https://oauth2.googleapis.com/token
User Info Endpoint URL — Use value
https://www.googleapis.com/oauth2/v3/userinfo
After creating the authentication object, configure the settings of your data source.
Note
Authentication with an OAuth client and a user account for Compose and other features that require user-initiated connections does not require an OAuth configuration object. It will need to be configured separately on the Compose tab of the data source settings.