Move the Alation Database to AWS RDS¶

Customer Managed Applies to customer-managed instances of Alation

Available from version 2023.3.5

Overview¶

Alation uses a PostgreSQL database to store customer metadata and other information, such as some event data. Due to the amount of data stored in Alation, this internal database can grow quite large, which can result in performance issues for backups or migrations. To free up space on an Alation customer-managed server, Alation versions 2023.3.5 and later offer server administrators the ability to extract the database from the server and store it on a separate Amazon Web Service (AWS) Relational Database Service (RDS) instance.

We recommend that the RDS instance be located in the same VPC and region as your Alation instance.

Alation has two PostgreSQL databases that can be moved in this way. Many of the steps below must be done for both databases:

rosemeta, the main database
the lineage database

To move the internal Postgres databases to RDS, the databases are first copied to the RDS instance. Alation and its internal Postgres databases continue to function as normal during this process. We use a logical replication process that continues to stream database changes to the new copy until all data has been moved. Once a database copy on RDS is ready, you point Alation to that database, and the internal database becomes unused.

We recommend taking a backup before starting this procedure. As a precaution, keep the backup after completing this procedure.

This procedure was designed to cause as little disruption as possible. Most of the procedure can be done with no downtime, and users can continue using Alation without trouble. When a step does require downtime, it will be noted.

The overall process goes like this:

Step 1: Delete Old Data

Step 2: Find and Remove Corrupted Data

Step 3: Gather Information for the RDS Instance

Step 4: Create the RDS Instance

Step 5: Migrate the Data to the RDS Instance

Step 6: Switch Alation to Use the RDS Instance

You can also Troubleshoot the Migration and Maintain Alation with the Database on an AWS RDS Instance.

Each step is described in detail below.

Step 1: Delete Old Data¶

Note

This step can be done without any downtime. Alation users can continue using Alation as usual.

This step is optional but highly recommended. To make the migration to RDS faster and more reliable, we recommend deleting old data from certain large tables. We have provided a script to delete specific data from specific tables. The following table shows which database tables are affected and how far back data is deleted.

Table	Delete Data Older Than
rosemeta_executionevent rosemeta_executioneventexpressionmention rosemeta_executioneventmention rosemeta_executioneventunresolvedmention rosemeta_executionsession	6 months
rosemeta_metadatachangestats	5 months
metrics_event metrics_metric	30 days

To delete old data:

Use SSH to connect to the Alation server.
Enter the Alation shell using the following command:
```
sudo /etc/init.d/alation shell
```

Switch to the alation user:
```
sudo su alation
```

Run the pre-validation SQL queries and save the results in a separate file:

psql -d rosemeta -U alation -f /opt/alation/ops/postgres_rds_migration/pruning_scripts/pre-validation.sql > /tmp/pre-validation.out

Run the scan_prune.pyc script to delete the old data:

python /opt/alation/django/rosemeta/one_off_scripts/scan_prune.pyc -p purge -i metadata metrics qli > /tmp/scan_prune.log 2>&1

Run the post-validation SQL queries and save the results in a separate file:

psql -d rosemeta -U alation -f /opt/alation/ops/postgres_rds_migration/pruning_scripts/post-dbvalidation.sql > /tmp/post-dbvalidation.out

Search the log file at /tmp/scan_prune.log for the following message:
```
--- Finished execution of scan problematic tables ---
```
- If you find this message, it indicates that everything is good, and you can skip to Step 2: Find and Remove Corrupted Data.
- If you don’t find the finished message, do the following:
  1. Search scan_prune.log for the following message:
    The difference between one of pointer table is greater than tolerated difference of {} , Please create support ticket".format(TOLERATED_PROCESSED))
    - If you find this message in the log, something has gone wrong. Stop here, and contact Alation Support. Send the pre-validation.out, scan_prune.log, and post-dbvalidation.out files.
    - If you don’t find this message, do the next step.
  2. Search scan_prune.log for the following message:
    At this point, The QLI archival did not clear any data. Please review
    If you find that message in the log, check for the following messages:
    INFO - There's close to 0 records to process for rosemeta_executionevent. INFO - There's close to 0 records to process for rosemeta_executioneventmention. INFO - There's close to 0 records to process for rosemeta_executioneventexpressionmention.
    - If there are close to 0 records to process, you can proceed to Step 2: Find and Remove Corrupted Data.
    - If not, stop here, and contact Alation Support. Send the pre-validation.out, scan_prune.log, and post-dbvalidation.out files.

Step 2: Find and Remove Corrupted Data¶

Note

This step can be done without any downtime. Alation users can continue using Alation as usual.

To ensure the migration works successfully, you have to make sure there’s no corrupted data in Alation’s PostgreSQL databases. If there is corrupted data, it has to be removed.

To find and remove corrupted data:

If you’re not on the Alation server already:
1. Use SSH to connect to the Alation server.
2. Enter the Alation shell using the following command:
```
sudo /etc/init.d/alation shell
```
Switch to the alation user:
```
sudo su alation
```
Run the script db_scan_parallel.pyc against the rosemeta database:
```
python /opt/alation/ops/postgres_rds_migration/scanning_script/db_scan_parallel.pyc -b /bin/ -h /tmp -p 5432 -U alation -d rosemeta
```
Note

This may take some time. For example, on a t3.2xlarge instance with a 364 GB PostgreSQL database, it may take about 30 minutes.

Examine the output from db_scan_parallel.pyc.

If the scan detected no corruption, you’ll see this at the end of the output:
```
Total tables successfully dumped: <#>, failed_tables: []
```

If you see a message indicating that some tables failed, it means some rows in those tables are corrupted and need to be removed. Make a list of each failed table.

Example failure message:

(ERROR) PID=31625: b'pg_dump: error: Dumping the contents of table "data_storage_blobaccesspostgres" failed: PQgetResult() failed.\npg_dump: error: Error message from server: ERROR: missing chunk number 0 for toast value 3523697 in pg_toast_3131713\npg_dump: error: The command was: COPY public.data_storage_blobaccesspostgres (id, model_code, blob_key, blob_value) TO stdout;'

From the message above, you would take note that the table data_storage_blobaccesspostgres failed.

Run the script db_scan_parallel.pyc against the lineage database:

python /opt/alation/ops/postgres_rds_migration/scanning_script/db_scan_parallel.pyc -b /bin/ -h /tmp -p 5432 -U alation -d lineage

Examine the output from db_scan_parallel.pyc.

If the scan detected no corruption, you’ll see this at the end of the output:
```
Total tables successfully dumped: <#>, failed_tables: []
```

If you see a message indicating that some tables failed, it means some rows in those tables are corrupted and need to be removed. Make a list of each failed table.

Example failure message:

(ERROR) PID=31625: b'pg_dump: error: Dumping the contents of table "data_storage_blobaccesspostgres" failed: PQgetResult() failed.\npg_dump: error: Error message from server: ERROR: missing chunk number 0 for toast value 3523697 in pg_toast_3131713\npg_dump: error: The command was: COPY public.data_storage_blobaccesspostgres (id, model_code, blob_key, blob_value) TO stdout;'

Exit from the alation user:
```
exit
```
Did the scans find corruption?
- If the results of db_scan_parallel.pyc found no corruption in both rosemeta and lineage, you can skip to Step 3: Gather Information for the RDS Instance.
- If the scan did find corruption, proceed with steps 9 to 15 below.
If the scan found corruption in one or more tables, you need to identify exactly which rows were corrupted. To do this, we have provided a find_bad_rows function with your Alation installation. To use the function, first switch to the postgres user:
```
sudo su postgres
```

Add the find_bad_rows function to Postgres:

psql -h /tmp -d rosemeta < /opt/alation/ops/postgres_rds_migration/corruption_check/find_bad_rows.sql

Execute find_bad_rows as shown below, replacing <FAILED_TABLE_NAME> with the name of a failed table:

psql -h /tmp -d rosemeta -c "select find_bad_rows('public.<FAILED_TABLE_NAME>');"

The function will output the ctid for each corrupted row. Example output:

NOTICE: Corrupted ctid: (56,70)
NOTICE: XX001: missing chunk number 0 for toast value 3523697 in pg_toast_3131713
NOTICE: Corrupted ctid: (57,109)
NOTICE: XX001: missing chunk number 0 for toast value 3523702 in pg_toast_3131713

Note

This step may take a long time, depending on the size of the tables.

Double check each row to make sure it’s corrupted. Run the following command for each corrupted row:

psql -h /tmp -d rosemeta -c "SELECT \* FROM public.<FAILED_TABLE_NAME> where ctid = '(<ROW_CTID>)'"

Example output:

missing chunk number 0 for toast value 3523697 in pg_toast_3131713

Delete the corrupted rows. Run the following command for each corrupted row:

psql -h /tmp -d rosemeta -c "DELETE FROM public.<FAILED_TABLE_NAME> where ctid = '(<ROW_CTID>)'"

Example output:

DELETE 1

Repeat the prior two steps for each corrupted row in the table.
Repeat the prior three steps for each failed table.
Exit from the postgres user:
```
exit
```

Step 3: Gather Information for the RDS Instance¶

Note

This step can be done without any downtime. Alation users can continue using Alation as usual.

In this step, you’ll make sure you have all the information needed to create the RDS instance.

Ensure that you have AWS keys to create the RDS instance using AWS APIs.
Ensure that the machine you’re using has access to the AWS VPC where you are creating the RDS instance. We recommend that the RDS instance be located in the same VPC and region as your Alation instance.

Determine how much storage your RDS instance will require.

If you’re not on the Alation server already:
1. Use SSH to connect to the Alation server.
2. Enter the Alation shell using the following command:
```
sudo /etc/init.d/alation shell
```
Enter the PostgreSQL shell:
```
alation_psql
```

Display the size of the lineage and rosemeta databases

\l+

The output will look something like this:

                                                                              List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |   Size    | Tablespace |                Description
-----------+----------+----------+-------------+-------------+-----------------------+-----------+------------+--------------------------------------------
 lineage   | alation  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 5774 MB   | pg_default |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 8221 kB   | pg_default | default administrative connection database
 rdsadmin  | rdsadmin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | rdsadmin=CTc/rdsadmin+| No Access | pg_default |
           |          |          |             |             | rdstopmgr=Tc/rdsadmin |           |            |
 rosemeta  | alation  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 177 GB    | pg_default |
 template0 | rdsadmin | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/rdsadmin          +| 8041 kB   | pg_default | unmodifiable empty database
           |          |          |             |             | rdsadmin=CTc/rdsadmin |           |            |
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8213 kB   | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |           |            |
(6 rows)

Look in the Size column, and add the sizes for the lineage and rosemeta databases together. Then double that number.

Example: In the example output above, lineage is 5774 MB and rosemeta is 177 GB, so the total is about 183 GB. Double that number is 366 GB.
If your final number is less than 20 GB, plan to use the minimum of 20 GB. The maximum is 16384 GB.
Exit the PostgreSQL shell:
```
\q
```

Determine the RDS instance class that’s required to hold your PostgreSQL database. The table below shows some recommendations based on the size of your database, concurrent users, and Alation instance size.

PostgreSQL Object Count	Concurrent User Load	Alation Host Size on AWS	RDS Instance Type Required
Up to 5 million	150 users	m5a.4xl	db.m6g.large
Up to 20 million	150 users	m5a.8xl	db.m5.2xlarge
Over 20 million	150 to 800 users	m5a.12xl	db.m5.4xlarge

Figure out how many CPUs you have on your Alation server. This will determine how many migration workers you can use. In the Alation shell, run this command:
```
lscpu | grep -E '^CPU\('
```
Example output:
```
CPU(s):              8
```
Save this number for use in a later step.
Generate two passwords:
1. One for the RDS administration, referred to as ALATION_RDS_ADMIN_PASSWORD in these instructions.
2. One for the PostgreSQL user that Alation will use to connect to the migrated PostgreSQL database. This password is referred to as ALATION_PASSWORD in these instructions.
You can generate these passwords however you like, or you can use this command on Linux to generate two random 16-digit passwords:
```
cat /dev/urandom | tr -dc A-Za-z | head -c 16 ; echo ''
```
Save these two passwords in a secure location. You will need them in later steps.

Step 4: Create the RDS Instance¶

Note

This step can be done without any downtime. Alation users can continue using Alation as usual.

You’ll now create the AWS RDS instance that you’ll move the Alation database to. We have provided some Terraform scripts in the Alation installation that will automatically create the RDS instance.

To create the RDS instance from the Alation server:

If you’re not on the Alation server already:
1. Use SSH to connect to the Alation server.
2. Enter the Alation shell using the following command:
```
sudo /etc/init.d/alation shell
```
Install Terraform if it’s not already installed. You can find installers on Terraform’s installation page. As an example, on an AMD64 Linux machine, you could install Terraform like this:
```
sudo curl -O https://releases.hashicorp.com/terraform/1.0.6/terraform_1.0.6_linux_amd64.zip
```
```
sudo unzip terraform_1.0.6_linux_amd64.zip
```
```
sudo mv terraform /usr/bin/
```
Switch to the alation user:
```
sudo su alation
```

Using terminal, set some environment variables that the Terraform scripts rely on. Replace the angle brackets and their contents with your own values:

export AWS_ACCESS_KEY_ID=<ADMIN_API_KEY>
export AWS_SECRET_ACCESS_KEY=<ADMIN_SECRET_ACCESS_KEY>
export TF_VAR_alation_region=<ALATION_AWS_REGION>
export TF_VAR_on_prem_alation_ec2_id=<ALATION_EC2_INSTANCE_ID such as i-0f76425cf32057e81>
export TF_VAR_rds_admin_password=<ALATION_RDS_ADMIN_PASSWORD>
export TF_VAR_allocated_storage=<STORAGE_SIZE_IN_GB>
export TF_VAR_multi_az=true
export TF_VAR_rds_instance_class=<RDS_INSTANCE_CLASS>
export TF_VAR_storage_type=gp3

Go into the folder containing the Terraform scripts:

cd /opt/alation/ops/postgres_rds_migration/rds_setup_terraform/

Initialize Terraform:
```
terraform init
```
Validate that the Terraform configuration is valid:
```
terraform validate
```
If you notice any errors in the output, contact Alation Support.
Verify that the Terraform plan is accurate:
```
terraform plan
```
If you notice any errors in the output, contact Alation Support.
Run the Terraform script:
```
terraform apply --auto-approve
```
This will automatically create an RDS instance and display its RDS endpoint, which you will need in later steps. This step may take several minutes. Near the end of the output you will see the RDS endpoint. Example output:
```
data.dns_a_record_set.rds_endpoint: Read complete after 0s [id=onprem-alation-i-02ad22d793b9d6dfa-rds.calwlq0balzb.us-west-2.rds.amazonaws.com]
```
The value after id= is the RDS endpoint. In this example, it is onprem-alation-i-02ad22d793b9d6dfa-rds.calwlq0balzb.us-west-2.rds.amazonaws.com.

Now that the RDS instance exists, you can migrate the Alation database to it. Proceed to Step 5: Migrate the Data to the RDS Instance below.

Step 5: Migrate the Data to the RDS Instance¶

Important

This step requires about 10 minutes of downtime.

If this process fails for any reason, you can reset the RDS instance using the steps under Reset the RDS Instance. Then try this section again.

In the following steps, you will copy the rosemeta and lineage databases to your RDS instance.

If you’re not in the Alation shell already:
1. Use SSH to connect to the Alation server.
2. Enter the Alation shell using the following command:
```
sudo /etc/init.d/alation shell
```
Switch to the alation user:
```
sudo su alation
```

Use a text editor to replace the contents of setup_internal_postgres.pyc with the following:

# /opt/alation/env/bin/python

import socket
import sys
import pty
import subprocess

EXPECTED_POSTGRES_VERSIONS = {"13.1", "13.6", "13.8", "13.11"}
PGLOGICAL_PACKAGE = "pglogical"

def get_ip(host):
   return socket.gethostbyname(host)

def execute_command(bash_command, shell_or_die=False):
   master_fd, slave_fd = pty.openpty()
   sp = subprocess.Popen(
      bash_command, stdin=slave_fd, stderr=subprocess.PIPE, stdout=subprocess.PIPE, shell=True
   )
   out, err = sp.communicate()
   if shell_or_die and sp.returncode != 0:
      raise Exception("Failed: %s", out.decode(), err.decode())
   return sp.returncode, out.decode(), err.decode()

def version_check():
   cmd = 'sudo -u postgres psql -qtAX -h /tmp -d rosemeta -c "SHOW server_version;"'
   code, stdout, stderr = execute_command(cmd)
   if code != 0:
      raise Exception("Cannot get postgres version")
   elif stdout.strip() not in EXPECTED_POSTGRES_VERSIONS:
      raise Exception(
            "Its version %s is not equal to %s" % (stdout.strip(), str(EXPECTED_POSTGRES_VERSIONS))
      )
   print("## Its version is %s" % str(EXPECTED_POSTGRES_VERSIONS))

def setup_postgresql_auto_conf(num_of_wall_senders):
   cmd = "sudo -u postgres cat /data1/pgsql/13/data/postgresql.auto.conf"
   code, stdout, stderr = execute_command(cmd)
   if code != 0:
      raise Exception("Cannot cat /data1/pgsql/13/data/postgresql.auto.conf")
   if "listen_addresses = '*'" not in stdout:
      cmd = "sudo -u postgres -- sh -c $'echo \"listen_addresses = \\'*\\'\" >> /data1/pgsql/13/data/postgresql.auto.conf'"
      execute_command(cmd, shell_or_die=True)
   if "wal_level = logical" not in stdout:
      cmd = "sudo -u postgres -- sh -c $'echo \"wal_level = logical\" >> /data1/pgsql/13/data/postgresql.auto.conf'"
      execute_command(cmd, shell_or_die=True)
   if "max_replication_slots = %s" % num_of_wall_senders not in stdout:
      cmd = (
            "sudo -u postgres -- sh -c $'echo \"%s\" >> /data1/pgsql/13/data/postgresql.auto.conf'"
            % ("max_replication_slots = %s" % num_of_wall_senders)
      )
      execute_command(cmd, shell_or_die=True)
   if "max_wal_senders = %s" % num_of_wall_senders not in stdout:
      cmd = (
            "sudo -u postgres -- sh -c $'echo \"%s\" >> /data1/pgsql/13/data/postgresql.auto.conf'"
            % ("max_wal_senders = %s" % num_of_wall_senders)
      )
      execute_command(cmd, shell_or_die=True)
   if "shared_preload_libraries = 'pglogical'" not in stdout:
      cmd = "sudo -u postgres -- sh -c $'echo \"shared_preload_libraries = \\'pglogical\\'\" >> /data1/pgsql/13/data/postgresql.auto.conf'"
      execute_command(cmd, shell_or_die=True)
   print("## Completed postgresql.auto.conf update")

def setup_pg_hba_conf(rds_private_ip):
   cmd = "sudo -u postgres cat /data1/pgsql/13/data/pg_hba.conf"
   code, stdout, stderr = execute_command(cmd)
   if code != 0:
      raise Exception("Cannot cat /data1/pgsql/13/data/pg_hba.conf")
   if rds_private_ip not in stdout:
      cmd = (
            "sudo -u postgres -- sh -c 'echo \"host     all     all      %s/32           trust\" >> /data1/pgsql/13/data/pg_hba.conf'"
            % rds_private_ip
      )
      execute_command(cmd, shell_or_die=True)
   if "host     all     all      127.0.0.1/0           trust" not in stdout:
      cmd = "sudo -u postgres -- sh -c $'echo \"host     all     all      127.0.0.1/0           trust\" >> /data1/pgsql/13/data/pg_hba.conf'"
      execute_command(cmd, shell_or_die=True)
   if "host    all   alation      ::1/128      trust" not in stdout:
      cmd = "sudo -u postgres -- sh -c $'echo \"host    all   alation      ::1/128      trust\" >> /data1/pgsql/13/data/pg_hba.conf'"
      execute_command(cmd, shell_or_die=True)
   print("## Completed pg_hba.conf update")

def restart_postgres():
   cmd = "alation_action stop_postgres"
   execute_command(cmd, shell_or_die=True)
   cmd = "alation_action start_postgres"
   execute_command(cmd, shell_or_die=True)
   print("## Restart completed")

def main(argv):
   if len(argv) < 2:
      raise Exception("python setup_internal_postgres.py <rds_endpoint> <num_of_wall_senders>")
   rds_endpoint, num_of_wall_senders = argv[0], argv[1]
   rds_private_ip = get_ip(rds_endpoint)
   print(rds_private_ip)
   version_check()
   setup_postgresql_auto_conf(num_of_wall_senders)
   setup_pg_hba_conf(rds_private_ip)
   restart_postgres()

if __name__ == "__main__":
   main(sys.argv[1:])

Important

The next step requires Alation to be down for about ten minutes.

Still as the alation user, run the setup_internal_postgres.pyc script and provide the RDS endpoint and number of CPUs you identified earlier:

python /opt/alation/ops/postgres_rds_migration/rds_migration_scripts/setup_internal_postgres.pyc <RDS_ENDPOINT> <#_CPUs>

Example output:

## Its version is {'13.11', '13.6', '13.1', '13.8'}
## Completed postgresql.auto.conf update
## Completed pg_hba.conf update
## Restart completed

Note

Items 5–7 apply to the rosemeta database. Items 8–10 apply to lineage. These steps can take a long time. You can follow the items for rosemeta in parallel to the items for lineage.

To migrate the rosemeta database to your new RDS instance, run the migrate_postgres_to_rds.pyc script as shown below. This will sync the data to your RDS instance. We use a logical replication process that continues to stream database changes to the new copy until all data has been moved. This ensures no data will be lost.

Important

This process runs asynchronously. It may take a long time. As an example, if you have:

An Alation instance with 8 cores and 32 GB of memory
An RDS instance with 4 cores and 16 GB of memory

It may take about four to five hours to migrate a 175 GB rosemeta database.

To run the script, use the command below. Replace <RDS_ENDPOINT> with the endpoint of your RDS instance, <ALATION_RDS_ADMIN_PASSWORD> with the RDS admin password you created earlier, and <ALATION_PASSWORD> with the Alation password you created earlier.

python /opt/alation/ops/postgres_rds_migration/rds_migration_scripts/migrate_postgres_to_rds.pyc --rds_port 5432 --rds_endpoint <RDS_ENDPOINT> --rds_admin_password <ALATION_RDS_ADMIN_PASSWORD> --new_alation_password <ALATION_PASSWORD> --remove_indexes true --target_database rosemeta

Successful output will look like this:

initialized RateSampler, sample 100% of traces
initialized RateSampler, sample 100% of traces
Connecting to DogStatsd(udp://localhost:8125)
{"data": {"module": "alation_conf", "version": null, "requestid": null, "msg": "Reloading conf. current time: 1702581142.2829413"}, "header": {"timestamp": "2023-12-14T19:12:22.282994", "appname": null, "instanceurl": null, "tenantid": null, "loglevel": "DEBUG", "traceid": "0"}, "dd.trace_id": "0", "dd.span_id": "0", "dd.env": ""}
{"data": {"module": "alation_conf", "version": null, "requestid": null, "msg": "alation_conf is initialized. current time: 1702581142.2829413"}, "header": {"timestamp": "2023-12-14T19:12:22.467949", "appname": null, "instanceurl": null, "tenantid": null, "loglevel": "DEBUG", "traceid": "0"}, "dd.trace_id": "0", "dd.span_id": "0", "dd.env": ""}
Check pg_hba entry for replication...
Check pg_hba entry for replication...Done
Check listen_addresses...
Check listen_addresses...Done
Check if migration_rosemeta already created
Drop file_fdw extension
Start export postgres schema only...
Create indexes backup file ...Done
Removed indexes from schema file ...Done
Start export postgres schema only...Done
Check if alation exists
Create new alation user in RDS...
Prepare alation user in RDS...Done
Check if alation_db_user exists
Create new alation_db_user user in RDS...
Prepare alation user in RDS...Done
Check if alation_db exists
Create new alation_db user in RDS...
Prepare alation user in RDS...Done
Check if datadog exists
Create new datadog user in RDS...
Prepare alation user in RDS...Done
Check if rosemeta exists
No rosemeta exists
Create new alation user...
Build existing extension set
Create extensions intarray,hstore,amcheck,pglogical,pg_visibility if not exists ...
Create extensions intarray,hstore,amcheck,pglogical,pg_visibility ...Done
Import /tmp/rosemeta_schema_only.dump ...
Import /tmp/rosemeta_schema_only.dump ...Done
List target tables own by public and alation
Check if publication migration_rosemeta is created
migration_rosemeta is created
Validate publication table by table
All tables are included in public
Check if subscription migration_rosemeta already created
## create subscription migration_rosemeta connection 'host=<RDS IP> port=5432 dbname=rosemeta user=alation' publication migration_rosemeta
Stopping AgentWriter thread

Check on the progress of migrating the rosemeta database by running get_subscription_status.pyc. Replace <RDS_ENDPOINT> with the endpoint of your RDS instance, <ALATION_PASSWORD> with the Alation password you created earlier.
```
python /opt/alation/ops/postgres_rds_migration/rds_migration_scripts/get_subscription_status.pyc --rds_endpoint <RDS_ENDPOINT> --rds_port 5432 --new_alation_password <ALATION_PASSWORD> --target_database rosemeta
```
- While the process is still running, you’ll see the following in the output:
```
## res: NEED_TO_WAIT
```
- When the process is finished, you’ll see the following in the output:
```
## res: STOP_WAITING
```
When get_subscription_status.pyc shows ## res: STOP_WAITING, it automatically starts another script, create_indexes.py. This script creates indexes for all the tables in the rosemeta database. It runs in the background. This script must finish before you can move on to Step 6: Switch Alation to Use the RDS Instance below. You can check the status of this script by tailing the log file:
```
tail -f /tmp/create_indexes.log
```
When the process is done, you’ll see this in the log:
```
Created 2090/2091 indexes.
Created 2091/2091 indexes.
Creation of indexes /tmp/rosemeta_postdata_only.dump ...Done
Removed indexes dump file /tmp/rosemeta_postdata_only.dump since indexes have been applied.
```
Note

Items 8–10 below apply to the lineage database. You can follow these steps in parallel to items 5–7.

To migrate the lineage database to your new RDS instance, run the migrate_postgres_to_rds.pyc script as shown below. Replace <RDS_ENDPOINT> with the endpoint of your RDS instance, <ALATION_RDS_ADMIN_PASSWORD> with the RDS admin password you created earlier, and <ALATION_PASSWORD> with the Alation password you created earlier.

python /opt/alation/ops/postgres_rds_migration/rds_migration_scripts/migrate_postgres_to_rds.pyc --rds_port 5432 --rds_endpoint <RDS_ENDPOINT> --rds_admin_password <ALATION_RDS_ADMIN_PASSWORD> --new_alation_password <ALATION_PASSWORD> --remove_indexes true --target_database lineage

This process runs asynchronously. Successful output will look like this:

initialized RateSampler, sample 100% of traces
initialized RateSampler, sample 100% of traces
Connecting to DogStatsd(udp://localhost:8125)
{"data": {"module": "alation_conf", "version": null, "requestid": null, "msg": "Reloading conf. current time: 1702581232.9394724"}, "header": {"timestamp": "2023-12-14T19:13:52.939548", "appname": null, "instanceurl": null, "tenantid": null, "loglevel": "DEBUG", "traceid": "0"}, "dd.trace_id": "0", "dd.span_id": "0", "dd.env": ""}
{"data": {"module": "alation_conf", "version": null, "requestid": null, "msg": "alation_conf is initialized. current time: 1702581232.9394724"}, "header": {"timestamp": "2023-12-14T19:13:53.135925", "appname": null, "instanceurl": null, "tenantid": null, "loglevel": "DEBUG", "traceid": "0"}, "dd.trace_id": "0", "dd.span_id": "0", "dd.env": ""}
Check pg_hba entry for replication...
Check pg_hba entry for replication...Done
Check listen_addresses...
Check listen_addresses...Done
Check if migration_lineage already created
Drop file_fdw extension
Start export postgres schema only...
Create indexes backup file ...Done
Removed indexes from schema file ...Done
Start export postgres schema only...Done
Check if alation exists
Reset alation user password in RDS...
Prepare alation user in RDS...Done
Check if alation_db_user exists
Reset alation_db_user user password in RDS...
Prepare alation user in RDS...Done
Check if alation_db exists
Reset alation_db user password in RDS...
Prepare alation user in RDS...Done
Check if datadog exists
Reset datadog user password in RDS...
Prepare alation user in RDS...Done
Check if lineage exists
No lineage exists
Create new alation user...
Build existing extension set
Create extensions pg_visibility,intarray,hstore,pglogical,amcheck if not exists ...
Create extensions pg_visibility,intarray,hstore,pglogical,amcheck ...Done
Import /tmp/lineage_schema_only.dump ...
Import /tmp/lineage_schema_only.dump ...Done
List target tables own by public and alation
Check if publication migration_lineage is created
migration_lineage is created
Validate publication table by table
All tables are included in public
Check if subscription migration_lineage already created
## create subscription migration_lineage connection 'host=<RDS IP> port=5432 dbname=lineage user=alation' publication migration_lineage
Stopping AgentWriter thread

Check on the progress of migrating the lineage database by running get_subscription_status.pyc. Replace <RDS_ENDPOINT> with the endpoint of your RDS instance, <ALATION_PASSWORD> with the Alation password you created earlier.
```
python /opt/alation/ops/postgres_rds_migration/rds_migration_scripts/get_subscription_status.pyc --rds_endpoint <RDS_ENDPOINT> --rds_port 5432 --new_alation_password <ALATION_PASSWORD> --target_database lineage
```
While the script is still running, you’ll see the following in the output:
```
## res: NEED_TO_WAIT
```
When the script is finished, you’ll see the following in the output:
```
## res: STOP_WAITING
```
When get_subscription_status.pyc shows ## res: STOP_WAITING, it automatically starts another script, create_indexes.py. This script creates indexes for all the tables in the lineage database. It runs in the background. This script must finish before you can move on to Step 6: Switch Alation to Use the RDS Instance below. You can check the status of this script by tailing the log file:
```
tail -f /tmp/create_indexes.log
```
When the process is done, you’ll see this in the log:
```
Created 2090/2091 indexes.
Created 2091/2091 indexes.
Creation of indexes /tmp/rosemeta_postdata_only.dump ...Done
Removed indexes dump file /tmp/rosemeta_postdata_only.dump since indexes have been applied.
```
Wait until the create_indexes.py script has finished indexing the tables for both rosemeta (item 8 above) and lineage (item 10 above). Then move on to the next step.

Step 6: Switch Alation to Use the RDS Instance¶

Important

This step requires about an hour of downtime.

If this process fails for any reason, you can reset the RDS instance using the steps under Reset the RDS Instance. Then try the migration again.

Now that all the data has been migrated to the RDS instance, you need to tell Alation to use the RDS instance.

Important

The first three steps require about one hour of downtime.

When the create_indexes.py script has finished indexing the tables for both rosemeta and lineage, you are ready to direct Alation to start using the copies that are now on the RDS instance.

Still as the alation user on the Alation server, stop Alation from running:
```
alation_supervisor stop celery:* ingestion customer-portal-agent java:* web:* lineage logical-metadata:*
```

Important

The next step requires downtime.

Run the post_postgres_migration.pyc script to tell Alation to start using the rosemeta database on the RDS instance. Replace the following placeholders:
- <RDS_ENDPOINT> with the endpoint of your RDS instance
- <ALATION_RDS_ADMIN_PASSWORD> with the RDS admin password you created earlier
- <ALATION_PASSWORD> with the Alation password you created earlier
- <NUM_CORES> with the number of CPUs you identified earlier
```
python /opt/alation/ops/postgres_rds_migration/rds_migration_scripts/post_postgres_migration.pyc --rds_port 5432 --rds_endpoint <RDS_ENDPOINT> --rds_admin_password <ALATION_RDS_ADMIN_PASSWORD> --new_alation_password <ALATION_PASSWORD> --num_of_threads <NUM_CORES> --target_database rosemeta
```
The output will be very lengthy. If it’s successful, you’ll see this at the end of the output:
```
Switching to RDS <YOUR RDS ENDPOINT>...Done
```

Important

The next step requires downtime.

Run the post_postgres_migration.pyc script to tell Alation to start using the lineage database on the RDS instance.
- <RDS_ENDPOINT> with the endpoint of your RDS instance
- <ALATION_RDS_ADMIN_PASSWORD> with the RDS admin password you created earlier
- <ALATION_PASSWORD> with the Alation password you created earlier
- <NUM_CORES> with the number of CPUs you identified earlier
```
python /opt/alation/ops/postgres_rds_migration_scripts/rds_migration_scripts/post_postgres_migration.pyc --rds_port 5432 --rds_endpoint <RDS_ENDPOINT> --rds_admin_password <ALATION_RDS_ADMIN_PASSWORD> --new_alation_password <ALATION_PASSWORD> --num_of_threads <NUM_CORES> --target_database lineage
```
If it’s successful, you’ll see this at the end of the output:
```
Switching to RDS <YOUR RDS ENDPOINT>...Done
```

Start Alation back up:

alation_supervisor start celery:* ingestion customer-portal-agent java:* web:* lineage logical-metadata:*

Alation should now be using the external RDS instance.

Stop the internal Postgres:

sudo -u postgres /usr/pgsql-13/bin/pg_ctl -D /data1/pgsql/13/data stop -w

Confirm that Alation is using the RDS instance:
```
alation_conf pgsql.config.host
```
The parameters lineage-service.pgsql.config.host and pgsql.config.host should show your RDS endpoint.

The process is now complete. You should be able to log into Alation and use it as usual.

After you’ve verified that the migration was successful and Alation is using the RDS instance without problems, you can now drop the data from the internal rosemeta and lineage databases. As a precaution, we recommend keeping the latest backups from before the migration.

Troubleshoot the Migration¶

If you encounter problems with migrating the Alation databases to an AWS RDS instance, please contact Alation Support.

Reset the RDS Instance¶

If you run into trouble during Step 5: Migrate the Data to the RDS Instance, you can reset the RDS instance and try again. To reset the RDS instance:

If you’re not on the Alation server already:
1. Use SSH to connect to the Alation server.
2. Enter the Alation shell using the following command:
```
sudo /etc/init.d/alation shell
```

Connect to the rosemeta database on the RDS instance:

export PGPASSWORD=<ALATION_PASSWORD>; psql -h <RDS endpoint> -U alation -d rosemeta

Then drop the subscription migration:

rosemeta=> drop subscription migration;

Connect to the lineage database on the RDS instance:

export PGPASSWORD=<ALATION_PASSWORD>; psql -h <RDS endpoint> -U alation -d lineage

Then drop the subscription migration:

lineage=> drop subscription migration;

Connect to the database on the RDS instance as the postgres user:

export PGPASSWORD=<ALATION_RDS_ADMIN_PASSWORD>; psql -h <RDS_ENDPOINT> -U postgres

Change the owner of the rosemeta database to the postgres user, then drop the database:

postgres=> alter database rosemeta owner to postgres;

postgres=> drop database rosemeta;

Change the owner of the lineage database to the postgres user, then drop the database:

postgres=> alter database lineage owner to postgres;

postgres=> drop database lineage;

If rosemeta or lineage is still being used, stop Alation and terminate the sessions:

alation_action stop_alation

postgres=> select pg_terminate_backend(pid) from pg_stat_activity where datname='rosemeta';

postgres=> select pg_terminate_backend(pid) from pg_stat_activity where datname='lineage';

Roll back the changes to alation_conf:

alation_conf pgsql.config.remote -s False

alation_conf pgsql.config.host -s /tmp

alation_conf pgsql.config.port -s 5432

alation_conf pgsql.config.password -c

alation_conf lineage-service.pgsql.config.host -s /tmp

alation_conf lineage-service.pgsql.password -c

You should now be ready to retry Step 5: Migrate the Data to the RDS Instance.

Maintain Alation with the Database on an AWS RDS Instance¶

Moving the database to an AWS RDS instance will not change how you upgrade, back up, or restore the Alation applications. The commands for upgrading, backing up, and restoring the Alation applications remain the same. However, there are some differences in the backup and restore behavior.

Upgrade¶

If an Alation update does not include a PostgreSQL version upgrade, there are no changes to the upgrade process for instances using an AWS RDS database.

However, if the update involves a PostgreSQL version upgrade, you must manually perform the database upgrade through the AWS console.

Back Up¶

After moving the database to AWS RDS, it will no longer be included directly in the Alation backup tar file. Instead, the Alation backup process creates an RDS snapshot in AWS using the existing RDS functionality. The snapshot is mapped to the Alation backup so it can be restored later along with Alation, if needed.

You can also use the Amazon RDS backup functionality separately from Alation backups.

Restore¶

When restoring Alation from a backup, the destructive_restore_all action will automatically retrieve the AWS RDS snapshot that is mapped to the backup. The old AWS RDS instance will be stopped, and the new one will become available. In the AWS console, you will see both the old and newly restored RDS instances. The old one can be deleted if you want to save costs.

Move the Alation Database to AWS RDS¶

Overview¶

Step 1: Delete Old Data¶

Step 2: Find and Remove Corrupted Data¶

Step 3: Gather Information for the RDS Instance¶

Step 4: Create the RDS Instance¶

Step 5: Migrate the Data to the RDS Instance¶

Step 6: Switch Alation to Use the RDS Instance¶

Troubleshoot the Migration¶

Reset the RDS Instance¶

Maintain Alation with the Database on an AWS RDS Instance¶

Upgrade¶

Back Up¶

Restore¶

Alation User Documentation PDFs