Curation Progress Report¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
The Curation Progress report is part of Analytics Stewardship.
The report is available in two views:
Curation Progress—Summary¶
Note
The Stewardship Dashboard and Curation Progress—Summary are not available in the New User Experience.
The Stewardship Dashboard contains an at-a-glance summary of the Curation Progress report that is pre-filtered for the logged-in user as Steward. Curation completion percentage for catalog objects is represented as a horizontal bar graph.
To open the full report from the Stewardship Dashboard, click the Detailed View link to the right of the report title.
The summary view offers several filters you can use to narrow the focus of the report:
Steward—Value of the Steward field on a catalog object. The default value is the logged-in user. If you select a different user or group, the report will update to show the curation progress of the selected steward. You can also remove the Steward filter and view curation progress for all stewards.
Note
The Steward field on catalog objects is a built-in field that allows assigning catalog objects to users. You can assign users, groups, or People Sets as stewards.
If you remove the Steward filter, the report may become too large to display. In such a case, you will see a warning in the user interface that the report only displays partial curation data.
Data Type—List filter allowing you to narrow down the report to one type of RDBMS object (data sources, schemas, tables, or columns).
Filter In—Searchable filter that allows searching and filtering for a specific parent object. You can select a specific data source, schema, or table. The report will update to show all children of the selected parent.
Note
When you set the Filter In filter, Alation will auto-set the Data Type filter to the child object type of the selected object and disable the Data Type filter. For example, if you select a schema Schema A in Filter In, then the Data Type filter will be automatically set to Tables and disabled.
You can use the Data Type and Filter In filters consecutively together if you select a child data type in Data Type and a parent object in Filter In. For example, if you select Schemas in Data Type, you can then select a specific data source in Filter In to view the curation progress for all schemas in this data source.
You can also use Filter In and Data Type as independent filters selecting one or the other depending on the purpose of analysis:
To view the curation progress for a specific type of data object, use the Data Type filter.
To view the curation progress for a specific object, use the Filter In filter.
Curation Progress—Full Report¶
To open the full-page Curation Progress report:
Open the Curate and Govern page.
Under Monitor, click Curation Progress to open the report.
In the full view, Curation Progress includes the Curation Progress Summary and Per Object Details reports. Filtering affects both the summary progress bar and the Per Object Details displayed.
Per Object Details¶
The Per Object Details part of the report has the following fields:
Name—Name of the object. To review the page of a specific object, click its name in this field.
Title—Title of the object generated by Lexicon.
Curation Progress—Curation progress bar, in percent.
Popularity—Object popularity bar. By default, the view is sorted in descending order by this column, with the most popular object on top.
Contains—Number and type of child objects, for example: 9 schemas. Click this number to filter the Curation Progress bar on top by this object. The corresponding child object type will automatically be added to the Data Type filter and the corresponding parent object will appear in the Filter In filter.
The filtering capabilities are the same as in the Curation Progress—Summary.
Understanding the Curation Progress Value¶
Curation Progress shows how much of the logical metadata—or catalog field values—has been filled in for data objects in the catalog. Only custom fields that have already been applied to the custom template for the given object type (data source, schema, table, and column) are considered for the calculation. If you add new custom fields to the templates, it will change the curation progress for the object.
Curation progress of a data object is a weighted measure of:
Direct curation status that shows how many logical metadata fields are filled in for this object.
Values shared via catalog sets are also taken into account. Any empty shared fields of a catalog set will be counted as “not curated” in the overall curation calculation. For details on catalog sets, see Create and View Catalog Sets.
Curation status of child objects, which is the aggregated curation status of child objects weighted by popularity.
As Popularity can equal to zero, we use
1+popularity
in the formula.An object cannot reach 100% curation progress unless all its child objects have 100% curation progress too.
Weights for direct curation status of the object and curation status of its child objects are assumed to be equal. We assign these parts of the calculation 50% each.
If an object doesn’t have children, then the weight for the direct object curation status is 100%.
Calculation¶
The calculation in the Curation Progress report is performed on the Title, Description, built-in, and custom fields for each object as well as all of the child objects of the parent object. All of the fields must be curated, or filled with a value, for an object to reach 100% curation progress.
To calculate curation progress for each object type, we start with calculating the direct curation status of column objects. For each column, the curation status is calculated using the following formula:
∑(has_value) / number of fields
The
has_value
value is a Boolean value that indicates if a field is filled or not (1
for filled and0
for empty)The
number of fields
value is the number of fields which participate in this calculation
Then we move up the object hierarchy to each next level (table, schema, and data source) and compute the curation progress for each object using the following formula:
[50% * ∑(has value) / number of fields] + [50% * ∑(child object curation status * (1 + child object Popularity)) / ∑(1 + child object Popularity)]
The value of
50% * ∑(has value) / number of fields
is the direct curation status of the current objectThe
50% * ∑(child object curation status * (1 + child object Popularity)) / ∑(1 + child object Popularity)
value is the average curation status of child objects weighted by popularity.
This way, curation progress is calculated for each object type:
Table (with columns as child objects)
Schema (with tables as child objects)
Data source (with schemas as child objects)
In the Curation Progress report that you see in Alation, the results of the calculations for each object are aggregated as an average across all objects in the selected object type, depending on the filter you set. For example, in a report like the one shown below, the result of 16%
is the average curation progress across all data sources calculated as the sum of curation progress values for all data sources divided by the number of data sources in the catalog: