Bot Configuration¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
ASM Settings for Bots¶
Visit your Alation Services Manager (ASM) settings page to ensure the following are configured:
Alation instance: Configure your Alation instance’s details here. You can choose to configure basic authentication (username and password) or a refresh token.
Alation Analytics: Configure your Alation Analytics database details. Snowflake- and Postgres-hosted Alation Analytics databases are supported.
SMTP: Optional server details for Bot notifications via email, if required.
Slack: Optional Slack app details for Bot notifications via Slack, if required.
Configuring a Bot¶
We will start with a sample Bot configuration. Below is a “Data Source Ownership” Bot which:
Reads all data sources and associated attributes from the rdbms_datasources Alation Analytics table
Checks if the Steward OR the Data Owner field for a data source is empty (you can choose between AND/OR operations)
If empty then start a conversation to notify the user(s) who have been assigned as stewards to the datasource. If there are none then notify the user #2 who is the default user with subject “Datasource without Steward(s) or Data Owner(s)”
Bot configurations use json format. To create this Bot, go to your Bot Manager in ASM, click the Add Bot button, click Edit, and copy this json into the configuration section:
{
"bot_name": "Data Source Ownership",
"description": "Bot to check for data sources without Steward(s) or Data Owner(s)",
"conn_type": {"sf":
{"otype": "DATA",
"aa_query": "SELECT * FROM public.rdbms_datasources WHERE deleted = False",
"last_etl": "SELECT last_load_time from information_schema.load_history where table_name = 'STG_RDBMS_DATASOURCES' ORDER BY last_load_time DESC LIMIT 1"}
},
"create_policy": True,
"rules": [
{
"fields_to_check": [
{"Steward":{"condition":"IS NULL", "field_type": "OBJECT_SET"}},
{"Data Owner":{"condition":"IS NULL", "field_type": "OBJECT_SET"}}
],
"fields_to_check_operator":"or",
"notification": {"type": "conversation",
"subject": "Datasource without Steward(s) or Data Owner(s)",
"frequency": "always",
"to": [{"column": "steward"}]
"default": [{"user": 2}]}
}
]
}
The column in the to section can be any valid column in the query response that contains valid user or group id details. In the default column on the last line, a valid Alation user or group id should be given.
Click the Validate button to check the format, then click Save. Your Bot has now been created, and the corresponding Bot Policy will automatically appear in your Alation catalog.
Configuring a Schedule¶
Bots are run manually by default but you can choose to schedule each Bot individually to run automatically at a frequency of your choosing.
Schedule a Bot in ASM’s Bot Manager screen by enabling its Scheduled toggle and enter a cron string alongside. For example, to run the Bot at midnight every day use cron string 0 0 * * *.
For more examples of cron strings please refer to this external site. To check your cron string enter it here.
Bot Parameters¶
Here is a complete list of the Bot configuration parameters:
bot_name |
The name of the Bot in the Bot Manager The name of the Bot Violation Policy within Alation |
description |
This description will be added to the description in the created Bot Violation Policy within Alation |
conn_type |
How your Bot will interact with Alation Analytics. There are two possible settings here. sf represents Snowflake, and this value tells your bot to use the Snowflake connector to communicate with Alation Analytics pg represents Postgres, and this value tells your bot to use the Postgres connector to communicate with Alation Analytics Example:
|
otype |
Alation otype that the Bot will be working with. See supported otypes below |
custom_template_name |
This is required when running a Bot against a BUSINESS_POLICY. The entry needs to be the exact name of the policy template entered into Alation. |
aa_query |
Data that the Bot will be analyzing. See Alation Analytics example queries below |
last_etl |
When a Bot runs, it stores the date_time of that run. Before it runs again, it compares the stored date_time to the last run of Alation ETL to see if there is any new data. If there is no new data, the Bot will not run. See last_etl examples below. |
create_policy |
Bot Violation policies are not always required. Pass true or false to instruct the Bot to create a policy. There will be times when you just want your Bot to run and perform an action if a condition is met. For example, attach a specific policy to an object if a condition is met. This example does not need a Bot Violation Policy; just have the Bot perform the action. |
rules |
A collection of rules your bot will run. Rules are waterfall-based, and the first rule condition satisfied wins. Any following rules will not be examined. Bots have a max limit of 4 rules. |
fields_to_check |
These are the fields that the condition in the rule should check. They are Alation fields and must be typed exactly as they are labeled within Alation. Spaces are acceptable, and the names have to match. Multiple fields_to_check entries are acceptable. If more than one entry, then the fields_to_check_operator is required. Example: "fields_to_check":[
{"Certified":{"condition":"IS NULL", "field_type": "PICKER"}},
{"Phase One":{"condition":"IS NULL", "field_type": "PICKER"}},
{"Phase Two":{"condition":"IS NULL", "field_type": "PICKER"}},
{"Phase Three":{"condition":"IS NULL", "field_type": "PICKER"}}
],
"fields_to_check_operator": "or",
|
fields_to_check_operator |
If more than one fields_to_check entry is added to a rule, a fields_to_check_operator is required. This can be and or or. |
properties_to_check |
You can create properties to check from your aa_query. These properties are analyzed just like fields_to_check. Multiple properties_to_check are permitted, but if more than one, the properties_to_check_operator is required. Example: "aa_query": "SELECT \*, DATEDIFF(day, verified_date::DATE,
CURRENT_DATE) as date_span FROM public.business_policy
WHERE policy_type_name = 'Compliance' AND deleted = False",
{"date_span":{"condition":"GREATER THAN", "value":"120", "field_type": "TEXT"}}
|
properties_to_check_operator |
If more than one properties_to_check entry is added to a rule, a fields_to_check_operator is required. This can be and or or. |
condition |
Supported conditions are IS_NULL IS NOT NULL CONTAINS DOES NOT CONTAIN GREATER THAN LESS THAN |
value |
Required if the following condition is used CONTAINS DOES NOT CONTAIN GREATER THAN LESS THAN |
field_type |
Support field types are TEXT DATE MULTI_PICKER OBJECT_SET PICKER RICH_TEXT |
actions_to_perform |
If a condition is satisfied, one or more actions can be performed. An action is altering the data within a field. Example: {"Business Policy":{"action":"ADD","field_type": "OBJECT_SET", "values":["business_policy_24"]}}
|
action |
Supported actions are ADD REMOVE |
field_type |
Supported Alation fields are TEXT DATE MULTI_PICKER OBJECT_SET PICKER RICH_TEXT |
value |
The value that will be passed to the field. Notice that this is a list since a MULTI_PICKER field is supported. |
Notifications¶
These are the parameters used when a notification is required:
type |
conversation slack |
subject |
{string} |
frequency |
updates - Any time there is a change in the violation count + or - always - Every time the Bot runs, a notification is sent regardless of the violation count |
to |
a list of objects that specify which user to notify if the validation fails. Each column in the to section can be any valid column in the query response that contains valid user or group id details (optional). |
default |
a list of valid user information based on the selected type of notification in the config. (user or group id for conversations; or valid email id for email and slack) |
Conversation Example "notification": {
"type": "conversation",
"subject": "Term not curated",
"frequency": "always",
"to": [
{
"column": "steward"
}
],
"default": [
{
"user": 6
},
{
"user": 2
}
]
}
A Conversation is then created with the specified Alation user ID - users 6 and 2 in this example. |
|
Email Example "notification": {
"type": "email",
"subject": "Schema Completeness",
"frequency": "always",
"to": [
{
"column": "steward"
},
{
"column": "data_owner"
}
],
"default": [
"[email protected]"
]
}
|
|
Slack Example "notification": {
"type": "slack",
"channel": "bot-auto-notification",
"subject": "Notification Check",
"frequency": "always",
"to": [
{
"column": "steward"
}
],
"default": [
"[email protected]"
]
}
|
Supported otypes¶
These are the Alation object types supported and the corresponding AAv2 tables:
otype |
AAv2 table |
SCHEMA |
rdbms_schemas |
TABLE |
rdbms_tables |
ATTRIBUTE |
rdbms_columns |
DATA |
rdbms_datasources |
BUSINESS_POLICY |
business_policy |
“aa_query” Examples¶
The following example SQL queries can be used in the aa_query parameter:
Data Source |
SELECT * FROM public.rdbms_datasources WHERE deleted = False AND EXCLUDED = False |
Schema |
SELECT * FROM public.rdbms_schemas WHERE deleted = False AND EXCLUDED = False |
Table |
SELECT * FROM public.rdbms_tables WHERE deleted = False AND EXCLUDED = False |
Column (Attribute) |
SELECT * FROM public.rdbms_columns WHERE deleted = False AND EXCLUDED = False |
Policy |
SELECT * FROM public.business_policy WHERE deleted = False AND EXCLUDED = False |
More complex queries can be used. The following example forms a flattened list of stewards’ email addresses and data_owners along with associated schemas, titles and descriptions.
WITH tmp_people AS (
SELECT
schema_id,
SPLIT_PART(f.value::STRING, '_', 1) AS people_otype,
TRY_TO_NUMBER(SPLIT_PART(f.value::STRING, '_', 2)) AS people_oid,
FROM public.rdbms_schemas,
LATERAL FLATTEN(input => steward) AS f -- Flatten the steward array
-- steward_value is the alias for flattened values
WHERE f.value IS NOT NULL AND f.value != ''
),
-- Aggregate email by schema_id
processed_people_set AS (
SELECT
TP.schema_id,
ARRAY_AGG(USR.user_email) AS steward
FROM tmp_people TP
JOIN public.users USR ON TP.people_oid = USR.user_id
WHERE TP.people_otype = 'user'
GROUP BY TP.schema_id
)
SELECT
PPS.steward,
RT.data_owner,
RT.schema_id,
RT.title,
RT.description
FROM public.rdbms_schemas RT
LEFT JOIN processed_people_set PPS ON RT.schema_id = PPS.schema_id
“last_etl” examples¶
For the second SQL query parameter labelled “last_etl”: Each time the Alation Analytics ETL job runs within your Alation Instance it leaves a “last updated” timestamp in the target database for each object it updates. If you make a catalog change to table metadata for example, the last updated stamp for the rdbms_table table will be updated to that time. When a Bot runs it will check to see if it has a stored timestamp from the last time the Bot ran. If it does, it will compare that timestamp to the one in the data object that the Bot is running against. If the time stamp for the data object is newer then the one stored, it will run the bot because there must be some new data. Otherwise the Bot will not run. If there is no timestamp stored because this is the first time the Bot has been run, it will ignore timestamps and just run. This efficiency feature eliminates unnecessary Bot executions on data that has not changed.
Data Source |
SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_DATASOURCES’ ORDER BY last_load_time DESC LIMIT 1 |
Schema |
SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_SCHEMAS’ ORDER BY last_load_time DESC LIMIT 1 |
Table |
SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_TABLES’ ORDER BY last_load_time DESC LIMIT 1 |
Column (Attribute) |
SELECT last_load_time from information_schema.load_history where table_name = ‘STG_RDBMS_COLUMNS’ ORDER BY last_load_time DESC LIMIT 1 |
Policy |
SELECT last_load_time from information_schema.load_history where table_name = ‘STG_BUSINESS_POLICY’ ORDER BY last_load_time DESC LIMIT 1 |
Further Bot Examples¶
It is often useful to create Bots with rules based on elapsed time. This type of Bot can be used for managing content that should be checked on a regular basis, or finding content that may have expired for example.
This is achieved by creating a date diff property in the aa_query being run. The date diff will return the number of days between two dates. So in the query we take the current date and the date stored in a custom field and return the difference in days. Then in the Bot rule we test that number and choose to do something if the number exceeds some threshold.
The following example Bot checks that Compliance type policies are verified no later than every 120 days. Aspects including the notification, policy type, and time range can be adjusted to meet your needs:
{
"bot_name": "Bot Sample - Verified Policy",
"description": "Bot to check if policy has been verified within a time range",
"conn_type": {
"sf":{
"otype": "BUSINESS_POLICY",
"custom_template_name": "Compliance",
"aa_query": "SELECT *, DATEDIFF(day, verified_date::DATE, CURRENT_DATE) as date_span FROM public.business_policy WHERE policy_type_name = 'Compliance' AND deleted = False",
"last_etl": "SELECT last_load_time from information_schema.load_history WHERE table_name = 'STG_BUSINESS_POLICY' ORDER BY last_load_time DESC LIMIT 1"
}
},
"create_policy": true,
"rules": [
{
"properties_to_check": [
{"date_span":{"condition":"GREATER THAN", "value":"120", "field_type": "TEXT"}}
],
"notification": {"type": "conversation",
"subject": "Policy verification has exceeded the 120 day limit. Please re-verify.",
"frequency": "always",
"to": [
{
"column": "steward"
}
],
"default": [{"user": 2}]}
}
]
}
Sample Bots on Github¶
A wide variety of sample Bots can be found on our public GitHub page.