Requirements for CSV/TSV Source Files¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
To upload a data dictionary successfully, make sure your CSV or TSV source file conforms to the format requirements below.
Important
Using third-party CSV file editing tools like Excel or Google Sheets to edit data dictionary files carries a risk of unintentional content modifications, specifically to the header line (the first line of the file). When editing your CSV file with these tools, ensure the export settings are configured to use commas as delimiters. It is also essential to preserve the original formatting of the column header line exactly as it was in the file downloaded from Alation.
For example, if a custom field name contains any of these special characters: ,
, :
, ;
, |
, they will be enclosed within additional double quotes in the downloaded file. A field name will appear as follows: """subject to:"":attribute"
. Quotes are removed by tools like Excel when the file is opened, edited, and saved. Verify the header format if you have fields with special characters.
Learn more in Special Characters in Custom Field Names.
Head¶
The source CSV or TSV file must include a head row with field names in the required format and the field values for each of the data objects to be updated:
Element |
Is required |
Description |
---|---|---|
|
No |
A column in the downloaded data dictionary containing object IDs and object types for each object. Should be considered read-only |
|
No |
A column in the downloaded data dictionary containing data types for each data column in included tables. Should be considered read-only |
|
Yes |
If any component of the key has an internal period, that component only should be triple-quoted. For example, if a schema has the name The data source object uses an empty key. |
|
Only if it is updated |
The title of the data object. |
|
Only if it is updated |
The description of the data object. |
All Custom Fields names to be updated |
Only if the field is updated |
The list of custom field names in the required format which will depend on the field type. |
Example Head
In a downloaded data dictionary, all fields are present in the head row. If you are creating a data dictionary source file from scratch, you can omit the al_datadict_item_properties
, and al_datadict_item_column_data_type
fields unless you’re updating the fields on a data source object page. If you are updating a data source object page, include the al_datadict_item_properties
field and its value as it’s required for identifying the data source object. Data source objects use an empty key value and cannot be identified unless the al_datadict_item_properties
field is present.
al_datadict_item_properties,al_datadict_item_column_data_type,key,title,description,summary,country,expert:user,relatedtable:table,test:article
Custom Field Format¶
The definition for the Object Set custom field must include the name of the field and the object type of the value in the following format: "field name:object type"
.
When uploading values for the Object Set field, you need to be aware of the Permitted Types included into this field. Overall, an Object Set custom field can refer to any combination of the following object types:
Data source
Schema
Table
Column
User
Group
Article
Term (2023.3.4 and later)
Policy (2023.3.4 and later)
For each Permitted Object that can be included in the field, you will need to add the associated object type. Use the following definitions:
Alation Object |
Associated object type to be used in CSV/TSV head |
Object set value format |
Example |
---|---|---|---|
data source |
data |
|
|
schema |
schema |
|
|
table |
table |
|
|
column |
attribute |
|
|
user |
user |
|
|
group |
groupprofile |
|
|
article |
article |
|
|
term |
glossary_term |
|
|
policy |
business_policy |
|
|
Important
As the name of the Object Set field in the source file, use its Name Singular property. From Alation version 2023.3.3, you can also specify the object set’s object ID, or a combination of the name and object ID, in the form
object_id | object_name
. Using an object ID can improve performance in updating from an uploaded data dictionary.For each Permitted Type included in the field, add an individual entry to the head of the file. For example, if the object set field
test
can reference columns and articles, the head should include individual definitions for either columns (test:attribute
) or articles (test:article
) (if one type of value is being updated) or both (if both types of value are updated):test:attribute,test:article
.If you are uploading data object names as values, you need to use the qualified name of the object as the value, including the data source ID. For example:
70.schema1
- value forschema1
in data source with id =70
70.schema1.tableA
- value for tabletableA
inschema1
in data source with id =70
70.schema1.tableA.volleyball
- value for columnvolleyball
intableA
inschema1
in data source with id =70
Example
Assume that there is an Object Set field Test
on the catalog page of a Data Source A. The field includes all Permitted Types.
You want to import values for this field using a data dictionary. The definition of the field Test
that you need to include into the file head will look like this:
test:data,test:schema,test:table,test:attribute,test:article,test:groupprofile,test:user,test:glossary_term,test:business_policy
The actual values may look like this:
70,70.schema1,70.schema1.tableA,70.schema1.tableA.volleyball,Article About Schema1,qagroup,[email protected],Privacy,Customer Access Policy
Multiple values for the same object type should be separated by semicolons:
70.schema1;70.schema2;80.schema1
The definition for the People Set custom field must include the name of the field and the definition for the object name that is referenced by the value in the following format: field name:object type
.
For People Set, you can use two object types:
user
groupprofile
Each of these types must have an individual entry in the file head.
Example
Assume that there is a People Set field Expert
on the catalog page of a Data Source A.
You want to import a value for this field Bill Travis
. Bill has an account in Alation with the username bill.travis@example.com
.
The definition of the field Expert
that you need to include into the file head will look like this:
expert:user
The actual value to be uploaded will look like this:
If the set of values to be uploaded includes both users and groups, then the head should include individual definitions for both the object types and will look like this:
expert:user,expert:groupprofile
The actual value to be uploaded will look like this:
[email protected],qa group
Multiple values for one type of value must be separated by semicolons:
Note
In the data dictionary file downloaded from Alation, you’ll notice that the People Set type of values have the format of <object_ID>|<value>
, for example: 524|sally@alationmail.com
. This is a more technical format that reduces the time to identify a catalog object during the processing of the source file. Also, if there are conflicting values in the catalog, IDs are the only way to differentiate them.
Alation also supports simpler formats, such as just <object_ID>
and <value>
.
To find an object ID, open the catalog page of this object. The ID will be present in the URL, for example:
For a user
https://<base_URL>/user/524/
—The object ID is524
. The corresponding value in the data dictionary would be524|sally@alationmail.com
.For a group
https://https://<base_URL>/group/22/
—The object ID is22
. The corresponding value in the data dictionary would be22|curators
.
For the Reference type of field, the head must use the following format: field name:object type
.
Permitted object types and the format are the same as for Object Set.
For the Picker type of field, the head must use the following format: field name
.
Example
Head:country
Value:Romania
Head requirements for this field type are the same as for Picker. Because there can be multiple values added to the field of this type, the multiple values must use the following format: "[""Value1"",""Value2"",""Value3""]"
.
The special value "[]"
resets the value of the multipicker field.
Example
The field Countries is a multi-select picker:
Head: countries
Values: "[""Germany"",""Canada"",""Korea""]"
For the Rich Text type of field, the head must use the following format: field name
.
The value must use the value
format. For example: This is a short summary.
For the Date type of field, the head must use the following format: field name
.
The value must use the YYYY/MM/DD
format. For example: 2019/02/17
.
Formats Cheat Sheet¶
Custom Field Type |
Format for Head |
Example for Head |
Format for Uploaded Value |
Value Example |
---|---|---|---|---|
Object Set |
|
|
|
|
People Set |
|
|
|
|
Reference |
|
|
|
|
Picker |
|
|
|
|
Multi-Select Picker |
|
|
|
|
Rich Text |
|
|
|
|
Date |
|
|
|
|
Special Characters in Custom Field Names¶
The data dictionary upload process considers some characters as special for its own formatting. Custom field names containing such characters require special formatting:
If a custom field name contains a pipe (
|
) or colon (:
) character, the whole field name should be wrapped in quotes (""
).If a custom field name contains a comma (
,
), the whole field name should be wrapped in quotes (""
) and the comma should be escaped with a backslash (\
).
The following table shows how some sample custom field names should appear on the column header line for different use cases:
Custom Field name |
Column Header format |
---|---|
|
|
|
|
|
|
|
|
|
|
Example CSV¶
2023.3.3 and Newer¶
al_datadict_item_properties,al_datadict_item_column_data_type,key,title,description,contains pii,critical data element,data quality policies,data quality status,data quality summary,derived term:glossary_term,expert:groupprofile,expert:user,status,steward:groupprofile,steward:user
oid=42;otype=table,,"""alation_edw.finance"".bnk_cust_dim",Bank Customer Details,This table is most often used by the Sales Operation Team. Data in this table is from customers in the banks North America region.,,_,_,,,_,_,_,_,,11|[email protected]
oid=11846;otype=attribute,"NUMBER(38,0)","""alation_edw.finance"".bnk_cust_dim.account_number",Account Number,,,,,,_,_,,,,,
oid=7667;otype=attribute,VARCHAR(31),"""alation_edw.finance"".bnk_cust_dim.city",City,,,,,,_,_,,,,,
oid=7854;otype=attribute,"NUMBER(38,0)","""alation_edw.finance"".bnk_cust_dim.custid",Customer unique ID,,,,,,_,_,,,,,
oid=8325;otype=attribute,VARCHAR(10),"""alation_edw.finance"".bnk_cust_dim.firstname",First Name,,,,,,_,_,,,,,
oid=8192;otype=attribute,VARCHAR(12),"""alation_edw.finance"".bnk_cust_dim.lastname",Lastname,,,,,,_,_,,,,,
oid=7549;otype=attribute,VARCHAR(2),"""alation_edw.finance"".bnk_cust_dim.state",State,,,,,,_,_,,,,,
2023.3.1 and Earlier¶
key,title,description,closure rule,dq conformity,has pii,stewards:groupprofile,stewards:user,risk & sensitivity,gold master:attribute,gdpr data compliance,policy:article,relevant dataset:table,pii compliance
census.annres_2010_2014,,,N/A,N/A,N/A,,,N/A,,N/A,,N/A,
census.annres_2010_2014.pop_2010,,,,,,,,,N/A,,,,
census.annres_2010_2014.pop_2014,,,,,,,,,N/A,,,,
census.annres_2010_2014.pop_2012,,,,,,,,,N/A,,,,
census.annres_2010_2014.pop_2011,,,,,,,,,N/A,,,,
census.annres_2010_2014.pop_2013,,,,,,,,,N/A,,,,
census.annres_2010_2014.cbsa,Core-Based Statistical Area,,,,,,,,N/A,,,,