Update Source Field for Dataflow Objects¶
Customer Managed Applies to customer-managed instances of Alation
Perform these steps to add and populate the Source field for dataflow objects after updating Alation to 2022.4.
This action applies if you enabled Lineage V3 prior to 2022.4. Running the script described below is required to ensure the correct filtering of dataflow objects using the Source filter on Lineage diagrams.
Note
In version 2022.4, Lineage graphs can be filtered by the data source from which they were generated using the Source field on dataflow objects. For the filtering to work correctly, this field must have a value. This script populates the Source field for dataflow objects created on versions before 2022.4 where this field was not populated.
In version 2023.1.6, this script has been updated to fix an issue where group IDs were not created for certain link types. When updating to version 2023.1.6, you should check if using this script is required.
On the HA pair, run the script on the primary server.
Prerequisites¶
Determine Which Lineage Service You Are Using¶
If you are not sure which Lineage service is in use on your instance, you can check it in the following way.
Use SSH to connect to the server and enter the Alation shell.
sudo /etc/init.d/alation shell
From the Alation shell, check if Lineage V3 is in use.
alation_conf lineage-service.enabled
This command should return the value
True
.lineage-service.enabled = True
If this check returns
False
, you are not using Lineage V3. Do not proceed with the script as it does not apply to your instance.
Check if Using this Script is Required¶
If you are using Lineage V3, check if you have dataflow objects where the Source field is not populated. If you do, proceed with the script. If you don’t, there is no need to run this script on your instance.
From the Alation shell, enter the Postgres shell.
alation_psql
Run the following queries:
SELECT count(*) FROM object_lineage_dataflow WHERE lineage_source_group_id IS NULL; \c lineage SELECT count(fp) FROM vertex WHERE is_temp=false and (agreegate_dataflow_fp <> '') IS TRUE and group_id IS NULL;
If any of the values returned is not zero, run the script using the steps in Script Usage below.
Exit the Postgres shell.
\q
Script Usage¶
The script should be run from the Alation shell.
Ensure that the Lineage V3 service is in a healthy state.
alation_supervisor status lineage
This command should return the status
RUNNING
.lineage RUNNING pid 1184, uptime 5 days, 19:10:35
Check that the Event Bus is running.
alation_supervisor status event-bus:*
This command should return the status
RUNNING
:event-bus:kafka-server RUNNING pid 1128, uptime 5 days, 19:11:05 event-bus:zookeeper-server RUNNING pid 1127, uptime 5 days, 19:11:05
Enter the Django shell.
alation_django_shell
Ensure that the Event Bus is consuming published messages.
from alation_event_bus_utils import check_event_bus check_event_bus()
The command should return
success
:Out[1]: {'success': 'Successfully published and consumed a message'}
Note
If you see errors like an example below, do not proceed and contact Alation Support.
Example error:
%3|1668643409.805|FAIL|rdkafka#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv4#127.0.0.1:9092 failed: Connection refused (after 0ms in state CONNECT)
If all the previous checks are successful, run the script:
from rosemeta.tasks.migrations import deferred_lineage_group_sync deferred_lineage_group_sync.delay()
The script creates a background job
deferred_lineage_group_sync
that can be monitored in Admin Settings > Monitor > Active Tasks. When the job finishes running, the status of the corresponding task will change to Completed.The script will add the Source field to dataflow objects which have the source information in the corresponding lineage link and where the source nodes are not of type
external
ordataflow_component
.Exit the Django shell.
exit
Log Location¶
The logs are written to celery-lineagepublishing_error.log
and lineage_error.log
in /opt/alation/site/logs inside the Alation shell. You can use grep
to view logs for the rosemeta.tasks.migrations.deferred_lineage_group_sync
task.
cat /opt/alation/site/logs/celery-lineagepublishing_error.log | grep rosemeta.tasks.migrations.deferred_lineage_group_sync
Example output that indicates success:
"message": "Task rosemeta.tasks.migrations.deferred_lineage_group_sync[c43d9fd7-0080-4892-bc45-ea818ddda2d6] succeeded in 0.3742910510045476s: None"
If the script fails, the log will capture the “failed” state. Contact Alation Support if the script results in a failure.
Validate Success¶
After the deferred_lineage_group_sync
task completes, validate the success as follows:
From the Alation shell, enter the Postgres shell.
alation_psql
The migration has succeeded when the SQL query below shows a count of zero:
\c lineage SELECT count(fp) FROM vertex WHERE is_temp=false and (agreegate_dataflow_fp <> '') IS TRUE and group_id IS NULL;
By the time these verification queries are run, there may be more groups created in the
lineage
database than inrosemeta
. This is expected and not an issue.Contact Alation Support if the count in the
lineage
database is less than the count inrosemeta
.Exit the Postgres shell.
\q
Exit the Alation shell.
exit