Configure the Data Source Connection¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
After you install the Azure Cosmos DB OCF connector, you must configure the connection to the Azure Cosmos DB data source.
The various steps involved in configuring the Azure Cosmos DB data source connection setting are:
Provide Access¶
To set the data source visibility,go to the Access tab on the Settings page of your Azure Cosmos DB data source, set the data source visibility using these options:
Public Data Source — The data source is visible to all users of the catalog.
Private Data Source — The data source is visible to the users allowed access to the data source by Data Source Admins.
You can add new Data Source Admin users in the Data Source Admins section.
Connect to Data Source¶
To establish the a connection to data source, you must:
Provide the JDBC URI¶
Important
We recommend that you provide the values in the corresponding fields on the General Settings page instead of the JDBC URI field in the Datasource Connection section. Leave this field empty if all the connection properties you need are available in the user interface.
JDBC URI Format for Account Key Authentication in Compose¶
Use the following JDBC URI format for account key authentication for Compose:
cosmosdb://AccountEndpoint=<myAccountEndpoint>;AccountKey=<myAccountKey>;
Configure Authentication¶
For metadata extraction (MDE), profiling and sampling, the connector supports the following authentication methods:
Account Key authentication
Account Key
Account Endpoint
Token Type: Master or Resource
Azure authentication
Azure Service Principal
Client Secret
Azure Tenant
Client ID
SSL authentication
SSL Client certificate file
SSL Client certificate file password
(Optional) SSL Server certificate
Important
SSL server certificate isn’t supported. If you use a server certificate for connection, contact Alation Support.
Configure Authentication Scheme Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Authentication section.
Field
Description
Auth Scheme
Specify the type of authentication for connecting to Azure Cosmos DB.
Default: AccountKey
Available Values:
AccountKey: Set this to perform authentication with Account Key and Account Endpoint.
AzureServicePrincipal: Set this to authenticate as Azure Service Principal using a Client Secret.
Important
OAuth isn’t supported. Use OAuth fields only for Azure Service Principal authentication scheme.
Account Endpoint
Specify the URL from the Keys blade of the Azure Cosmos DB account.
Account Key
Specify a master key token or a resource token for connecting to the Azure Cosmos DB REST API.
Token Type
Specify the type of token for authentication.
Available Values:
master: Available when an account is created as a set of primary and secondary keys.
resource: Available when users in a database are set up with access permissions for precise access control on a resource, also known as a permission resource.
If you choose to authenticate using Azure, configure Azure Authentication. For details, see Configure Azure Authentication Settings .
Configure Azure Authentication Settings¶
If you choose Azure Service Principal for authentication, follow these steps:
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Azure Authentication section.
Field
Description
Azure Tenant
Specify the Microsoft online tenant to access data.
If unspecified, the default tenant is used.
Azure Environment
Select the environment to use when establishing a connection.
Available Values:
GLOBAL
CHINA
USGOVT
USGOVTDOD
In the Connector Settings section, provide the following details in the OAuth section.
Field
Description
Initiate OAuth
Select this to initiate the process to obtain or refresh the OAuth access token when you connect.
GETANDREFRESH: Indicates that the entire OAuth Flow is handled by the provider. If no token currently exists, it is obtained by prompting the user through the browser. If a token exists, it gets refreshed when applicable.
OAuth Client ID
Specify the assigned client ID when you register your application with an OAuth authorization server.
OAuth Client Secret
Specify the assigned client secret when you register your application with an OAuth authorization server.
Important
OAuth isn’t supported. Use OAuth fields only for Azure Service Principal authentication scheme.
If you choose to encrypt using SSL, configure SSL. For details, see Configure SSL Authentication Settings .
Configure SSL Authentication Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Authentication section, select Encrypt.
In the Connector Settings section, provide the following details in the SSL section.
Field
Description
SSL Client Cert
Specify the certificate store for the client certificate and select the appropriate file type from the options under SSL Client Cert Type.
SSL Client Cert Type
Specify the type of key store that contains the SSL client certificate.
Available Options:
USER: (Default) For Windows, this specifies that the certificate store is a certificate store owned by the current user. Note that this store type is not available in Java.
MACHINE: For Windows, this specifies that the certificate store is a machine store. Note that this store type is not available in Java.
PFXFILE: The certificate store is the name of a PFX (PKCS12) file containing certificates.
PFXBLOB: The certificate store is a string (base64 encoded) representing a certificate store in PFX (PKCS12) format.
JKSFILE: The certificate store is the name of a Java key store (JKS) file containing certificates. Note that this store type is only available in Java.
JKSBLOB: The certificate store is a string (base64 encoded) representing a certificate store in JKS format. Note that this store type is only available in Java.
PEMKEY_FILE: The certificate store is the name of a PEM-encoded file that contains a private key and an optional certificate.
PEMKEY_BLOB: The certificate store is a string (base64 encoded) that contains a private key and an optional certificate.
PUBLIC_KEY_FILE: The certificate store is the name of a file that contains a PEM- or DER-encoded public key certificate.
PUBLIC_KEY_BLOB: The certificate store is a string (base64 encoded) that contains a PEM- or DER-encoded public key certificate.
SSHPUBLIC_KEY_FILE: The certificate store is the name of a file that contains an SSH-style public key.
SSHPUBLIC_KEY_BLOB: The certificate store is a string (base64 encoded) that contains an SSH-style public key.
P7BFILE: The certificate store is the name of a PKCS7 file containing certificates.
PPKFILE: The certificate store is the name of a file that contains a PPK (PuTTY Private Key).
XMLFILE: The certificate store is the name of a file that contains a certificate in XML format.
XMLBLOB: The certificate store is a string that contains a certificate in XML format.
SSL Client Cert Password
Specify the password for the client certificate. If the certificate store is of a type that requires a password, this property is used to specify that password to open the certificate store.
SSL Client Cert Subject
Specify the subject of the client certificate. The subject is a comma separated list of distinguished name fields and values.
Consider the following points:
If an exact match is not found, the store is searched for subjects containing the value of the property.
If a match is still not found, the property is set to an empty string, and no certificate is selected.
The special value
*
picks the first certificate in the certificate store.SSL Server Cert
Specify the TLS/SSL certificate to be accepted from the server.
Accepted Values:
A full PEM Certificate
A path to a local file containing the certificate
The public key
The MD5 Thumbprint (hex values can also be either space or colon separated)
The SHA1 Thumbprint (hex values can also be either space or colon separated) If not specified, any certificate trusted by the machine is accepted.
Certificates are validated as trusted by the machine based on the system’s trust store. The trust store used is the
javax.net.ssl.trustStore
value specified for the system.If no value is specified for this property, Java’s default trust store is used (for example, JAVA_HOMElibsecuritycacerts). Use
*
to signify to accept all certificates. Note that this is not recommended due to security concerns.Important
SSL server certificate isn’t supported. If you use server certificate for connection, contact Alation Support.
Save the details.
Test the Connection¶
The connection test checks database connectivity.
After configuring authentication, test the connection.
To validate the network connectivity, go to General Settings > Test Connection of the Settings page of your Azure Cosmos DB data source and click Test.
A dialog box appears confirming the status of the connection test.
Configure Additional Connection Settings¶
Apart from the mandatory configurations that you perform to connect to the data source on the General Settings tab, configure the following additional settings:
Note
In the General Settings tab, leave the Additional data source connection field blank and skip the Disable automatic lineage generation toggle as these options are not applicable to the Azure Cosmos DB OCF connector.
Configure Firewall Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Firewall section.
Field
Description
Firewall Type
Specify the protocol used by the proxy-based firewall for traffic tunneling.
Available Options:
NONE: Default.
TUNNEL: Opens a connection to Azure Cosmos DB and traffic flows back and forth through the proxy.
The default port is 80.
SOCKS4: Sends data through the SOCKSv4 proxy as specified in Firewall Server and Firewall Port.
The default port is 1080.
SOCKS5: Sends data through the SOCKSv5 proxy as specified in Firewall Server and Firewall Port.
The default port is 1080.
Firewall Server
Specify the host name, DNS name, or IP address of the proxy-based firewall.
Firewall Port
Specify the TCP port of the proxy-based firewall.
Firewall User
Specify the user name to authenticate with the proxy-based firewall.
Firewall Password
Specify the password to authenticate with the proxy-based firewall.
Save the details.
Configure Proxy Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Proxy section.
Field
Description
Proxy Auto Detect
Select this to use the system proxy settings. Don’t select this if you’re using custom proxy settings.
For SOCKS proxy, select the appropriate value in Firewall Type.
Proxy Server
Specify the hostname or IP address of a proxy to route HTTP traffic.
For SOCKS proxy, select the appropriate value in Firewall Type.
Proxy Port
Specify the TCP port the Proxy Server is running on.
Default: 80
Proxy Auth Scheme
Specify the authentication type to use to authenticate to the proxy server.
Available Values:
BASIC: (Default) Enables HTTP basic authentication.
DIGEST: Enables HTTP digest authentication.
NONE: No proxy authentication.
NEGOTIATE: Retrieves an NTLM or Kerberos token based on the applicable protocol for authentication.
NTLM: Retrieves only NTLM token based on the applicable protocol for authentication.
PROPRIETARY: Adds a custom token in the Authorization header of the HTTP request. It doesn’t generate NTLM or Kerberos token.
Proxy User
Specify the username to authenticate to the proxy server based on the chosen Proxy Auth Scheme.
If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats:
user@domain
domain\user
Proxy Password
Specify the password to authenticate to the proxy server based on the chosen Proxy Auth Scheme.
Proxy SSL Type
Select the SSL type when connecting to the proxy server.
Available Values:
AUTO: (Default) If the URL is an HTTPS URL, the provider will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option.
ALWAYS: The connection is always SSL enabled.
NEVER: The connection is not SSL enabled.
TUNNEL: The connection is established through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows through the proxy.
Proxy Exceptions
Specify a semicolon separated list of destination hostnames or IPs that are exempt from connecting through the proxy server.
Configure Logging Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Logging section.
Field
Description
Verbosity
Specify the verbosity level between 1 to 5 to include details in the log file.
Available Values:
- 1: Logs the query, number of rows returned by a query,
execution time, time of the start of execution, and any errors.
- 2: Logs everything included in the Verbosity level 1,
cache queries, and any additional information about the request.
- 3: Logs everything included in the Verbosity level 2,
HTTP headers, request body and response body.
- 4: Logs everything included in the Verbosity level 3,
transport-level communication with the data source. This includes SSL negotiation.
- 5: Logs everything included in the Verbosity level 4,
communication with the data source, and additional details that may be helpful in troubleshooting. This includes interface commands.
Log Modules
Includes the core modules in the log files. Add module names separated by a semi-colon.
By default, all modules are included.
Max Log File Count
Specify the maximum file count for log files. After the limit, the log file is rolled over and time is appended at the end of the file. The oldest log file is deleted.
Maximum Value: 2
Default: -1. A negative or zero value indicates unlimited files.
Configure Schema Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Schema section.
Field
Description
Browsable Schemas
Specify the schemas as subset of the available schemas in a comma separated list. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC
Tables
Specify the fully qualified name of the table as a subset of the available tables in a comma separated list.
For example, Tables=TableA,TableB,TableC
Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks.
For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.
Views
Specify the fully qualified name of the Views as a subset of the available tables in a comma separated list.
For example, Views=ViewA,ViewB,ViewC.
Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks.
For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.
Schema
Specify the Azure Cosmos DB database you want to work with.
Configure Miscellaneous Settings¶
On the Settings page of your Azure Cosmos DB data source, go to the General Settings tab.
In the Connector Settings section, provide the following details in the Misc section.
Field
Description
Batch Size
Specify the maximum size of each batch operation.
Default: 0
Calculate Aggregates
Specifies whether to return the calculated value of the aggregates or grouped by partition range.
Connection Lifetime
Specify the maximum limit for a connection to stay connected in seconds.
Default: 0 indicates unlimited lifetime for a connection.
Flatten Arrays
Specify an arbitrary number to flatten the elements in a nested array into columns. By default, the nested arrays are returned as JSON strings.
Set it to
-1
to flatten all the elements.Flatten Objects
Select this to flatten the object properties in a nested array into columns. By default, the nested arrays are returned as JSON strings.
Force Query On Non Indexed Containers
Force the use of an index scan to process the query if indexing is disabled or the right index path is not available.
Generate Schema Files
Specify the preference when to generate and save the schemas.
Available Options:
Never: Doesn’t generate a schema file.
OnUse: A schema file is generated the first time a table is referenced, provided the schema file for the table does not already exist. In SQL, the schemas are generated as you execute SELECT queries.
OnStart: A schema file is generated at connection time for any tables that do not currently have a schema file.
OnCreate: A schema file is generated when running a CREATE TABLE SQL query.
Max Rows
Specify the limit for the number of rows returned if no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
Max Threads
Specifies the maximum number of concurrent requests for Batch CUD (Create, Update, Delete) operations.
Multi Thread Count
Aggregate queries in partitioned collections will require parallel requests for different partition ranges.
Set this to the number of parallel requests to be issued at the same time.
Other
Specify the caching, integration, or formatting properties in a list format separated by a semicolon.
Available Options:
Caching Configuration:
CachePartial=True: Caches only a subset of columns specified in the query.
QueryPassthrough=True: Passes the specified query to the cache database instead of using the SQL parser of the provider.
Integration and Formatting:
DefaultColumnSize: Sets the default length of string fields when the data source does not provide column length in the metadata.
The default value is 2000.
ConvertDateTimeToGMT: Converts date-time values to GMT instead of the local time of the machine.
RecordToFile=filename: Records the underlying socket data transfer to a specified file.
Page size
Specify the maximum number of results to return per page from Azure Cosmos DB.
A higher value results in better performance but uses more memory.
Pool Idle Timeout
Specify the idle time for a connection in a pool.
Default: 60 seconds
Pool Max Size
Specify the maximum number for connections in a pool.
To disable, set the value to 0 or less.
Default: 100
Pool Min Size
Specify the minimum number for connections in a pool.
Default: 1
Pool Wait Time
Specify the maximum wait duration for a connection to become available. If a new connection request is in wait for an available connection but exceeds the time, an error is thrown. By default, new connection requests have a forever wait time for an available connection.
Default: 60 seconds
Pseudo Columns
Specify the pseudo columns in the comma-separated list to be added as columns to the table.
For example, “Table1=Column1, Table1=Column2, Table2=Column3”.
Use the * character to include all tables and columns in this format:
*=*
Read only
Select this to enforce only SELECT queries to work on Azure Cosmos DB.
Retry Wait Time
Specify the minimum number of milliseconds the provider needs to wait to retry a request.
Default: 2000
Row Scan Depth
Specify the maximum number of rows to scan for the available columns in a table.
Set it to
-1
to scan an arbitrary number of rows.Separator Character
Specify the character or characters to denote hierarchy or separate columns.
Default:
.
Note: If your data has columns that use a period (
.
) within the attribute name, specify any other character.Set Partition Key As PK
Select this to use the collection’s Partition Key field as part of composite Primary Key for the corresponding exposed table.
Timeout
Specify the time limit in seconds after which the operation is canceled and an error is thrown.
A value of 0 specifies that the operation never times out until completion or failure.
Default: 60 seconds
Type Detection Scheme
Specify how to scan data to determine the fields and datatypes in a document collection.
Available Values:
None: Returns all columns as strings.
Rowscan: Scans rows to heuristically determine the data type.
Recent: Scans the rows to heuristically determine the data type for the recent documents in a collection.
Use Connection Pooling
Select this to enable connection pooling.
Use Consistent Reads
Select this to always use Consistent Reads when querying Azure Cosmos DB.
User Defined Views
Specify the file path pointing to the JSON configuration file that contains custom views.
Use Rid As PK
Select this property to switch using the default
_rid
as primary key instead of column id.Write Throughput Budget
Defines the Requests Units (RU) budget per second that the Batch CUD (Create, Update, Delete) operations should not exceed.
Default: 1000
Click Save.
Disable Obfuscate Literals¶
You can hide literal values from queries ingested with query log ingestion and displayed on the Queries tab of a schema and table catalog objects.
Go to General Settings > Obfuscate Literals of the Settings page of your Azure Cosmos DB data source and disable the Obfuscate literals toggle.
When enabled, literal values are substituted with placeholder values. Disable this option when you want literal values in queries to be visible to users.
By default, this option is disabled.
Configure Logging¶
To set the logging level for your Azure Cosmos DB OCF data source logs, perform these steps:
On the Settings page of your Azure Cosmos DB OCF data source, go to General Settings > Logging configuration.
Select a logging level for the connector logs and click Save.
The available log levels are based on the Log4j framework.
You can view the connector logs in Admin Settings > Server Admin > Manage Connectors > Azure Cosmos DB OCF connector.