Clean Up Stewardship Records

Applies from version 2025.1.4

This procedure outlines the steps for running the Stewardship Dashboard cleanup script, which addresses potential data inconsistencies in the Stewardship Dashboard. This applies if you experience the following issue: the Object Without Stewards report on the Stewardship Dashboard incorrectly displays objects with stewards.

You can run this script at your convenience after the Alation update to 2025.1.4 is complete and all Alation components (Alation, Alation Analytics, Alation Connector Manager) have been updated. Running this script doesn’t require system downtime. Users can continue working on the instance while the script is running. The script queries and deletes records only from the stewardship_objectcurationstatus table of the server database, which is accessed by the Stewardship Dashboard for reading and by various stewardship tasks for updates. It has no dependencies on other tables and doesn’t impact ongoing user operations. In an HA setup, the script should be run from the primary node. This ensures that any deletions performed in the stewardship_objectcurationstatus table are properly replicated to the secondary nodes. All output logs from the script will be written to the cleanup_object_curation_status_records.log file. This log file will be located in the /opt/alation/site/logs directory in the Alation shell, alongside other Alation log files.

To run the script:

  1. Use SSH to connect to the Alation server.

  2. Enter the Alation shell using the following command:

    sudo /etc/init.d/alation shell
    
  3. Change the user to alation:

    sudo su alation
    
  4. Open the Django shell using the following command:

    alation_django_shell
    
  5. Execute the following code in the Django shell.

    from rosemeta.one_off_scripts.cleanup_object_curation_status_records import cleanup_object_curation_status_records
    
    cleanup_object_curation_status_records()
    
  6. To track the script’s progress in real-time, you can monitor the cleanup_object_curation_status_records.log file. Below is an example of the log output during a cleanup where 3,000 records were deleted with a default batch size of 1,000:

    Starting object curation status records cleanup process with deletion batch size=1000
    Executing the first query to fetch the dangling/duplicate record ids from object curation status table...
    Fetched 2000 record ids from the first query.
    Executing the second query to fetch the dangling/duplicate record ids from object curation status table...
    Fetched 1000 record ids from the second query.
    Total object curation status records to delete: 3000
    Found 3000 records to delete from object curation status table.
    Processing the batch 1...
    Deleted 1000 object curation status records in this batch.
    Deleted 1000 object curation status records in this batch.
    Deleted 1000 object curation status records in this batch.
    Total object curation status records deleted: 3000
    Completed object curation status records cleanup process in 0.042 seconds.
    
  7. If there are no records eligible for deletion, the script will log the following message, indicating successful completion:

    No dangling/duplicate records found in object curation status table.
    
  8. In the case of a successful execution where records were deleted, the script will log a message similar to the following:

    Completed object curation status records cleanup process in 0.042 seconds.
    
  9. Once the script has completed, exit the Django shell and then the Alation shell by using exit twice.

Handling Script Failures

The script is designed to be idempotent, meaning it can be executed multiple times without adverse effects in case of a failure. If the issue persists after re-running the script, please contact the Alation Support or SRE team for further assistance.

Error Messages:

  • Database error while fetching records: If a database error occurs while fetching records for deletion, you will see a log similar to:

    Database error while fetching object curation status record ids: [exception details]
    
  • Unexpected error while fetching records: If a non-database error occurs while fetching records, the log will be similar to:

    An unexpected error occurred while fetching object curation status record ids: [exception details]
    
  • Database error during batch deletion: If a database error occurs during the deletion of records in a batch, the log will be similar to:

    Database error while deleting object curation status records batch: [exception details]
    
    Problematic first few object curation status record ids (if any): [IDs]
    
  • Unexpected error during batch deletion: If a non-database error occurs during batch deletion, the log will be similar to:

    An unexpected error occurred during object curation status records batch deletion: [exception details]
    
    Problematic first few object curation status record ids (if any): [IDs]
    
  • Incomplete deletion: If some batches fail to complete successfully (even though the script allows continuing with other batches), a warning message will be logged at the end, similar to:

    Warning: Expected to delete 3000 records, but 2000 were deleted.
    This might be due to errors in some batches, object curation status record ids not existing, or other concurrent modifications.