Virtual RDBMS Data Sources¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Important
You are viewing documentation for Classic Alation.
This article describes how to import metadata from a CSV source file into a virtual data source created for a relational database or a Hive database.
For information on Virtual Data Source API, refer to Upload a Virtual Data Source on Alation’s Developer Portal.
Virtual RDBMS data sources include basic technical metadata. The fields available for importing are:
Name
Table type (for tables)
Column type (for columns)
Import Schemas, Tables, Columns, and Indices¶
You can import metadata in the CSV format to populate a virtual data source. The CSV file must meet several requirements. Prepare your CSV source file observing the format described below.
Important
Property values are case-sensitive. Use them as recommended. For example, Boolean properties should be in lowercase: true
and false
. Ensure all values are correctly formatted before uploading your CSV file.
Requirements for Head and Keys¶
The file must include the head, declaring keys and properties, and then each line in the file should consist of a specific key along with the properties that apply, with empty values for properties that do not apply.
Head¶
Sample file head: key, table_type, column_type
Note that table_type
can be VIEW
or TABLE
.
Key¶
Alation determines the data object type based on how many dots there are in key:
schema
is a schemaschema.table
is a tableschema.table.column
is a columnschema.table.index
is recognized as index if the key is accompanied by the property index_type.
If your database type supports multipart schema (for example, this is the case for SQL Server, Amazon Redshift, and Netezza), then the key structure must be as follows:
dbname.schema
is a schemadbname.schema.table
is a tabledbname.schema.table.column
is a columndbname.schema.table.index
is an index if the key is accompanied by the property index_type.
For more details on key format, see Keys with Special Characters on Alation’s Developer Portal.
Sample CSV file:
"key","table_type","column_type"
"test_schema","",""
"test_schema.table1","TABLE",""
"test_schema.table2","VIEW",""
"schema.table1.column1","","varchar(10)"
"schema.table2.column1","","varchar(10)"
Add Table and View Properties¶
Along with the key, you can also add the following properties, all of them optional.
Property |
Description |
Example |
---|---|---|
|
Specifies the type of
the
table. |
|
|
A URI or file path to
the location of the
underlying data, such
as, for example, an
HDFS URL for a Hive
table. Use this
parameter only when
the |
|
|
Name of the database account that owns this table. |
|
|
|
|
|
|
|
|
|
|
|
Constraint statements
enforced by
the DB. Use this
parameter only when
the |
|
|
Timestamp at which the table or view was created. Ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ |
|
|
Timestamp of the last ALTER statement executed against this table. Ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ |
|
|
An array of columns |
|
|
An array of columns used to bucket the table. |
|
|
An array of columns used to sort the table. |
|
|
An array of other names that can be used to refer to this table. Each synonym is represented as a JSON comprising a schema_name and table_name. |
|
|
A JSON of the skew column names to an array of their respective skewed column values that appear often. |
|
|
A comment field that stores a description of the table which is ingested from the source system. |
|
Note
You can have newline in the text values of the fields by adding a newline wherever needed. For example, newline can be inserted after AS:
CREATE VIEW view_name AS
SELECT column1, column2 FROM table_name WHERE condition;
For non-primitive data types, single quotes must be used to
enclose the string in the data type. For example: ['column1', 'column2']
.
Add Column Properties¶
For columns, you can specify the following properties, all of them optional:
Property |
Description |
Example |
---|---|---|
|
This property is optional but recommended and can be any string. |
|
|
Position of the column in the table which has it. This value needs to be a positive integer. If unspecified, the value defaults to the order in which it was uploaded. |
|
|
A comment field that stores a description of the column which is ingested from the source system. |
|
|
Field to indicate
if the column can
be nullable. Set
this to |
|
Add Indices¶
For indices, you can specify the following properties:
Property |
Required |
Description |
---|---|---|
|
yes The presence of this field identifies an index object in a column. |
This property is an enum. The value for this property can be one of:
For details, see Understand the Index Key Types and Icons When specifying an index, ensure that the corresponding table is already a part of the database metadata. Even for index upsert, this field is required. Example: Use |
|
yes |
An array of column names on which the index is defined. If the index is composite, this array will have multiple column names.
Example: |
|
no |
The underlying data structure used by the index. The value for this field can be one of:
Default: |
|
no |
A string having custom detailed information about the index. Example: |
|
no |
Set this boolean to This is not valid for the composite index. |
|
no |
Filter condition used while creating an index for part of the rows in the table. This is not valid for composite index. Example: |
|
no |
Set this boolean to When this is set to |
|
yes if
|
The key of the parent table object which the foreign index refers to. This is required only if Example: |
|
yes if
|
An array of column names on the parent table object which the foreign index refers to.
Example: |
Sample CSV file:
"key","table_type","column_type","index_type","column_names"
"public","","","",""
"public.customers","VIEW","","",""
"public.parts","TABLE","","",""
"public.parts.id","","varchar(40)","",""
"public.parts.index","","","SECONDARY","['id']"
Understand the Index Key Types and Icons¶
You can use one of the following index types (index_type
):
PRIMARY: Ensures the uniqueness of the primary key in a table.
SECONDARY: Specifies additional indices to speed up queries related to non-primary keys.
PARTITIONED_PRIMARY: Indicates a partitioned primary index for efficiently managing large tables.
UNIQUE: Enforces the uniqueness of values in a column or set of columns.
OTHER: Indicates custom or context-specific index types specific to certain database systems or applications.
Based on the value you set for the index_type
property, one of the following icons are displayed on catalog pages next to Columns:
Icon |
Description |
---|---|
Primary key icon. This icon is displayed when |
|
Foreign key icon. This icon is displayed when |
|
Index key icon. This icon is displayed when |
Encoding and Special Characters¶
If the data has Unicode characters, make sure the file is encoded in UTF8. Alation will show a preview before you confirm importing.
If a property in the head does not apply to the key, leave it empty.
Alation does not support symbols
' " [ ]
in the names loaded using CSV. If you must have these symbols in the names, you will have to use a complex escaping sequence.
Add, Update, and Delete Properties per Technical Metadata Object¶
Initially, you can upload only some of the properties for a key and
later update the same key with new properties or update or delete an
existing property value. For example, you can do one upload to add
view_sql
to all VIEWs and another upload to add ts_created
. This
second upload will retain the previously added view_sql
if you omit
that column.
Add Properties¶
Let us discuss examples of importing a TABLE, COLUMN and INDEX objects and their properties. Suppose, the first import uploads only some of the properties as illustrated in the Sample CSV below:
Sample CSV:
"key","table_type","column_type","db_owner","definition_sql","constraint_text","ts_created","ts_last_altered","partitioning_attributes","synonyms","skews_info","table_comment","index_type","column_names","index_type_detail","is_ascending
"public","","","","","","","","","","","","","","",""
"public.parts","TABLE","","alation","create table schema_a.table_a(column1 int);","column_a UNIQUE","2018-03-13T22:09:33Z","2018-03-13T22:09:33Z","['column1', 'column2']","[{'schema_name': 'schema_a','table_name':'table_a'},{'schema_name': 'schema_b','table_name':'table_b'}]","{'column1': ['column1_value1', 'column1_value2'],'column2': ['column2_value1', 'column2_value2']}","Created by DB","","","","",""
"public.parts.id","","int","","","","","","","","","","","","","",""
"public.parts.index","","","","","","","","","","","","SECONDARY","['id']","BTREE","MULTI_COLUMN_STATISTICS","true"
You can specify the following properties individually:
partitioning_attributes
bucket_attributes
sort_attributes
synonyms
Sample CSV:
"key","table_type","bucket_attribute1","bucket_attribute2","sort_attribute1","partitioning_attribute1","synonym1","synonym2"
"public.parts","TABLE","bucket_column1","bucket_column2","sort_column1","partition_column1","{'schema_name':'schema_a','table_name':'table_a'}","{'schema_name': 'schema_b','table_name': 'table_b'}"
Update Properties of an Existing Object¶
While creating the above TABLE object, we have not added
the data_location
property. We will next add that property and update
the value of the existing property db_owner
. For the COLUMN object, we
can also add position
and update column_type
. Similarly, for an
INDEX, we can add filter_condition
and update the existing property
data_structure
.
Sample update CSV:
"key","table_type","db_owner","data_location","position","index_type","filter_condition","data_structure"
"public.parts","TABLE","new_owner","hdfs:///user/hive/warehouse/parts","","","",""
"public.parts.id","","","","5","","",""
"public.parts.index","","","","","SECONDARY","([filteredIndexCol]>(0))","HASH"
All the properties mentioned while adding the object for the
first time are retained.
index_type
and is_foreign_key
cannot be changed after the index is
created.
foreign_key_table_name
and foreign_key_column_names
are required
if is_foreign_key
is part of the request. However, is_foreign_key
can be skipped to retain the existing foreign key reference
values.
Delete Properties of an Existing Object¶
For already uploaded properties of an object, you can delete the properties by uploading the same object with the property value set to null.
However, table_type
, index_type
, column_names
, is_foreign_key
,
foreign_key_table_name
, and foreign_key_column_names
property
values cannot be deleted.
Sample CSV:
"key","table_type","db_owner","data_location","position","index_type","data_structure","filter_condition"
"public.parts","","","","","",""
"public.parts.id","","","","","","",""
"public.parts.index","","","","","SECONDARY","",""
This upload removes the data_location
and db_owner
properties for
the TABLE object, removes column_type
and position
properties for
the COLUMN object, removes data_structure
and filter_condition
for
the INDEX object.
Upload the CSV File¶
To import metadata from a CSV file:
On the catalog page of the NoSQL data source, on the upper right, click Settings to open the data source settings page.
Cick the Import Metadata tab.
Upload or drag and drop the source CSV file into the upload area.
Review the uploaded file. Any formatting errors will appear in the Error column of the preview table.
In the top left corner of the uploaded table, click Confirm to confirm upload.
Refresh Job History. When the import is completed, the corresponding completed metadata extraction (MDE) job will be listed in the Job History table.
Navigate to the catalog page of the virtual data source to view the imported metadata.
On the catalog page of the NoSQL data source, click the three dots in the top right corner, then click Settings. The settings page will open.
Cick the Import Metadata tab.
Upload or drag and drop the source CSV file into the upload area.
Review the uploaded file. Any formatting errors will appear in the Error column of the preview table.
In the top left corner of the uploaded table, click Confirm to confirm upload.
Refresh Job History. When the import is completed, the corresponding completed metadata extraction (MDE) job will be listed in the Job History table.
Navigate to the catalog page of the virtual data source to view the imported metadata.