Extraction Queries for Amazon Redshift¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Query-based extraction allows users to customize metadata extraction down to the level of specific metadata objects, such as tables, columns, views, and others by using custom queries.
Table, column, view, and function definition metadata types are enabled by default, and users cannot disable them.
By default, the following additional metadata types are enabled, but can be disabled by users:
Primary keys
Foreign keys
Functions
Users can disable the metadata types that are not required by clearing the corresponding checkboxes.
System schemas are disabled by default. Users can enable their extraction if required.
To use this feature, you will need to write custom queries to extract specific metadata. Alation expects that these queries conform to a required structure and use the expected reserved identifiers. Sections below contain the expected query structure for each metadata type.
Catalog¶
Make sure that your query has a column labeled as CATALOG
in the SELECT
list.
Example:
SELECT
distinct CURRENT_DATABASE() AS CATALOG
FROM
PG_NAMESPACE
WHERE
lower(nspname) NOT LIKE 'pg_%%'
AND lower(nspname) NOT IN ( 'information_schema')
AND lower(nspname) NOT IN ('''');
Schema¶
Make sure that your query has a column labeled as CATALOG
, SCHEMA
in the SELECT
list.
Example:
SELECT
CURRENT_DATABASE() AS CATALOG,
nspname AS SCHEMA
FROM
PG_NAMESPACE
WHERE
lower(nspname) NOT LIKE 'pg_%%'
AND lower(nspname) NOT IN ( 'information_schema')
AND lower(nspname) NOT IN ('''');
Table¶
Make sure that your query has columns labeled as CATALOG
, SCHEMA
, TABLE_NAME
, TABLE_OWNER ``, ``TABLE_TYPE
, and REMARKS
in the SELECT
list.
Example:
SELECT
*
FROM
(
SELECT
CURRENT_DATABASE() AS CATALOG,
A.SCHEMA AS SCHEMA,
A.TABLE_NAME,
A.TABLE_OWNER,
A.TABLE_TYPE,
pgd.description AS REMARKS
FROM
(
SELECT
T.SCHEMANAME AS SCHEMA,
T.TABLENAME AS TABLE_NAME,
T.TABLEOWNER AS TABLE_OWNER,
'TABLE' AS TABLE_TYPE
FROM
PG_TABLES T
UNION
SELECT
V.SCHEMANAME AS SCHEMA,
V.VIEWNAME AS VIEW_NAME,
V.VIEWOWNER AS TABLE_OWNER,
'VIEW' AS TABLE_TYPE
FROM
PG_VIEWS V ) A
LEFT JOIN PG_CATALOG.pg_statio_all_tables c on
(c.relname = A.table_name
and c.schemaname = A.SCHEMA)
LEFT JOIN PG_CATALOG.pg_description pgd on
(pgd.objoid = c.relid)
WHERE
A.SCHEMA NOT IN (
SELECT
schemaname
from
svv_external_schemas)
AND lower(A.SCHEMA) NOT LIKE 'pg_%%'
AND (pgd.objsubid = 0)
UNION
SELECT
CATALOG,
SCHEMA,
TABLE_NAME,
TABLE_OWNER,
TABLE_TYPE,
REMARKS
from
(
SELECT
CURRENT_DATABASE() AS CATALOG,
A.SCHEMA as SCHEMA,
A.TABLE_NAME as TABLE_NAME,
A.TABLE_OWNER as TABLE_OWNER,
A.TABLE_TYPE as TABLE_TYPE,
null AS REMARKS,
pgd.objoid as objoid
FROM
(
SELECT
T.SCHEMANAME AS SCHEMA,
T.TABLENAME AS TABLE_NAME,
T.TABLEOWNER AS TABLE_OWNER,
'TABLE' AS TABLE_TYPE
FROM
PG_TABLES T
UNION
SELECT
V.SCHEMANAME AS SCHEMA,
V.VIEWNAME AS VIEW_NAME,
V.VIEWOWNER AS TABLE_OWNER,
'VIEW' AS TABLE_TYPE
FROM
PG_VIEWS V ) A
LEFT JOIN PG_CATALOG.pg_statio_all_tables c on
(c.relname = A.table_name
and c.schemaname = A.SCHEMA)
LEFT JOIN PG_CATALOG.pg_description pgd on
(pgd.objoid = c.relid)
WHERE
A.SCHEMA NOT IN (
SELECT
schemaname
from
svv_external_schemas)
AND lower(A.SCHEMA) NOT LIKE 'pg_%%' ) tt
where
not exists(
select
1
from
PG_CATALOG.pg_description pt
where
pt.objoid = tt.objoid
and pt.objsubid = 0)
UNION
SELECT
CURRENT_DATABASE() AS CATALOG,
A.schemaname AS SCHEMA,
A.tablename AS TABLE_NAME,
null as TABLE_OWNER,
'TABLE' as TABLE_TYPE,
null as REMARKS
from
SVV_EXTERNAL_TABLES A
join svv_external_schemas B on
A.schemaname = B.schemaname )
WHERE
lower(SCHEMA) NOT IN ( 'information_schema')
AND lower(SCHEMA) NOT IN ('''')
AND TABLE_TYPE != 'VIEW';
View¶
Make sure that your query has columns labeled as CATALOG
, SCHEMA
, VIEW_NAME
, VIEW_CREATE_STATEMENT
, 'VIEW' AS VIEW_TYPE
, and REMARKS
in the SELECT
list.
Example:
SELECT
CURRENT_DATABASE() AS CATALOG,
schemaname AS SCHEMA,
viewname AS VIEW_NAME,
CASE
WHEN lower(substring(definition,1,100)) like 'create materialized view%' THEN definition
WHEN lower(substring(definition,1,100)) like 'create or replace materialized view%' THEN definition
WHEN lower(substring(definition,1,100)) like 'create view%' THEN definition
WHEN lower(substring(definition,1,100)) like 'create or replace view%' THEN definition
ELSE 'CREATE OR REPLACE VIEW ' || SCHEMA || '.' || VIEW_NAME || ' AS ' || definition
END AS VIEW_CREATE_STATEMENT,
'VIEW' AS VIEW_TYPE,
VIEWOWNER AS TABLE_OWNER,
null AS REMARKS
FROM
PG_VIEWS
WHERE
lower(schemaname) NOT LIKE 'pg_%%'
AND lower(schemaname) NOT IN ('information_schema')
AND lower(schemaname) NOT IN ('''');
Column¶
Make sure that your query has columns labeled as CATALOG
, SCHEMA
, TABLE_NAME
, TYPE_NAME
, DATA_TYPE
, COLUMN_NAME
, ORDINAL_POSITION
, IS_NULLABLE
, REMARKS
, and COLUMN_DEFAULT
in the SELECT
list.
Example:
SELECT
table_catalog AS CATALOG,
table_schema AS SCHEMA,
table_name,
column_name,
data_type,
CASE
WHEN data_type = 'character varying' THEN 'varchar'
WHEN data_type = 'timestamp without time zone' THEN 'timestamp'
WHEN data_type = 'timestamp with time zone' THEN 'timestamptz'
WHEN data_type = 'double precision' THEN 'FLOAT'
ELSE data_type
END as type_name,
CASE
WHEN data_type = 'character varying' THEN character_maximum_length
WHEN data_type = 'numeric' THEN numeric_precision
WHEN data_type = 'character' THEN character_maximum_length
ELSE -1
END AS length,
numeric_scale,
ordinal_position,
is_nullable,
column_default,
remarks
FROM
svv_columns
WHERE
lower(SCHEMA) NOT LIKE 'pg_%%'
AND lower(SCHEMA) NOT IN ('information_schema')
AND lower(SCHEMA) NOT IN ('''');
Primary Key¶
Make sure that your query has columns labeled as CATALOG
, SCHEMA
, TABLE_NAME
, COLUMN_NAME
and KEY_SEQ
in the SELECT
list.
Example:
SELECT
CURRENT_DATABASE() AS CATALOG,
S.NSPNAME AS SCHEMA,
C.RELNAME AS TABLE_NAME,
CONSTR.CONNAME AS PK_NAME,
ATTR.ATTNAME AS COLUMN_NAME,
ATTR.ATTNUM AS KEY_SEQ
FROM
PG_CONSTRAINT CONSTR
INNER JOIN PG_ATTRIBUTE ATTR ON
ATTR.ATTNUM = ANY (CONSTR.CONKEY)
AND ATTR.ATTRELID = CONSTR.CONRELID
INNER JOIN PG_CLASS C ON
CONSTR.CONRELID = C.OID
INNER JOIN PG_NAMESPACE S ON
C.RELNAMESPACE = S.OID
WHERE
lower(S.NSPNAME) NOT LIKE 'pg_%%'
AND lower(S.NSPNAME) NOT IN ('''')
AND lower(S.NSPNAME) NOT IN ('information_schema')
AND CONSTR.CONTYPE = 'p';
Foreign Key¶
Make sure that your query has columns labeled as PK_CATALOG
, PK_SCHEMA
, PK_TABLE
, PK_COLUMN
, FK_CATALOG
, FK_SCHEMA
, FK_TABLE
, and FK_COLUMN
in the SELECT
list.
Example:
SELECT
CURRENT_DATABASE() AS FK_CATALOG,
CURRENT_DATABASE() AS PK_CATALOG,
F.FKTABLE_SCHEM AS FK_SCHEMA,
F.FKTABLE_NAME AS FK_TABLE,
F.FK_NAME,
F.FKCOLUMN_NAME AS FK_COLUMN,
F.KEY_SEQ,
P.PKTABLE_SCHEM AS PK_SCHEMA,
P.PKTABLE_NAME AS PK_TABLE,
P.PK_NAME,
P.PKCOLUMN_NAME AS PK_COLUMN
FROM
(
SELECT
S.NSPNAME AS FKTABLE_SCHEM,
C.RELNAME AS FKTABLE_NAME,
CONSTR.CONNAME AS FK_NAME,
ATTR.ATTNAME AS FKCOLUMN_NAME,
ATTR.ATTNUM AS KEY_SEQ,
CONSTR.CONFKEY,
CONSTR.CONFRELID
FROM
PG_CONSTRAINT CONSTR
INNER JOIN PG_ATTRIBUTE ATTR ON
ATTR.ATTNUM = ANY (CONSTR.CONKEY)
AND ATTR.ATTRELID = CONSTR.CONRELID
INNER JOIN PG_CLASS C ON
CONSTR.CONRELID = C.OID
INNER JOIN PG_NAMESPACE S ON
C.RELNAMESPACE = S.OID
AND CONSTR.CONTYPE = 'f'
WHERE
lower(S.NSPNAME) NOT LIKE 'pg_%%'
AND lower(S.NSPNAME) NOT IN ('''')
AND lower(S.NSPNAME) NOT IN ( 'information_schema')
) F
INNER JOIN (
SELECT
S.NSPNAME AS PKTABLE_SCHEM,
C.RELNAME AS PKTABLE_NAME,
CONSTR.CONNAME AS PK_NAME,
ATTR.ATTNAME AS PKCOLUMN_NAME,
ATTR.ATTNUM AS PKEY_SEQ,
C.OID
FROM
PG_CONSTRAINT CONSTR
INNER JOIN PG_ATTRIBUTE ATTR ON
ATTR.ATTNUM = ANY (CONSTR.CONKEY)
AND ATTR.ATTRELID = CONSTR.CONRELID
INNER JOIN PG_CLASS C ON
CONSTR.CONRELID = C.OID
INNER JOIN PG_NAMESPACE S ON
C.RELNAMESPACE = S.OID
AND CONSTR.CONTYPE = 'p'
WHERE
lower(S.NSPNAME) NOT LIKE 'pg_%%'
AND lower(S.NSPNAME) NOT IN ('''')
AND lower(S.NSPNAME) NOT IN ( 'information_schema')
) P ON
P.PKEY_SEQ = ANY (F.CONFKEY)
AND P.OID = F.CONFRELID;
Function¶
Make sure that your query has columns labeled as CATALOG
, SCHEMA
, FUNCTION_NAME
, and REMARKS
in the SELECT
list.
Example:
SELECT
CURRENT_DATABASE() AS CATALOG,
S.NSPNAME AS SCHEMA,
P.PRONAME AS FUNCTION_NAME,
P.PROSRC AS FUNCTION_DEFINITION,
NULL AS SPECIFIC_NAME,
NULL AS REMARKS
FROM
PG_PROC P
INNER JOIN PG_NAMESPACE S ON
P.PRONAMESPACE = S.OID
INNER JOIN PG_LANGUAGE L ON
P.PROLANG = L.OID
WHERE
(L.LANNAME IN ('internal', 'sql')
AND p.prorettype <> 'pg_catalog.trigger'::pg_catalog.regtype
OR L.LANNAME IN ('plpythonu'))
AND lower(S.NSPNAME) NOT IN ('''')
AND lower(S.NSPNAME) NOT LIKE 'pg_%%'
AND lower(S.NSPNAME) NOT IN ( 'information_schema');
Function Definition¶
Make sure that your query has columns labeled as CATALOG
, SCHEMA
, FUNCTION_NAME
, ARG_NAME
, TYPE_NAME
, ARG_TYPE
, and ARG_DEF
in the SELECT
list.
Example:
SELECT
CURRENT_DATABASE() AS CATALOG,
N.NSPNAME AS SCHEMA,
P.PRONAME AS FUNCTION_NAME,
ISNULL(P.PROARGNAMES,'{}') AS ARG_NAME,
P.PRONARGS AS NUM_OF_FUNCTION_ARGS,
ISNULL(STRING_TO_ARRAY(REPLACE(OIDVECTORTYPES(P.PROARGTYPES), '\"', ''), ','),'{}') AS ARG_TYPE,
FORMAT_TYPE(P.PRORETTYPE, NULL) AS TYPE_NAME,
P.PROSRC AS ARG_DEF
FROM
PG_PROC P
INNER JOIN PG_NAMESPACE N ON
P.PRONAMESPACE = N.OID
INNER JOIN PG_LANGUAGE L ON
P.PROLANG = L.OID
WHERE
(L.LANNAME IN ('internal',
'sql')
AND p.prorettype <> 'pg_catalog.trigger'::pg_catalog.regtype
OR L.LANNAME IN ('plpythonu'))
AND lower(N.NSPNAME) NOT IN ('''')
AND lower(N.NSPNAME) NOT LIKE 'pg_%%'
AND lower(N.NSPNAME) NOT IN ('information_schema');