sqlflow_public/databases/snowflake
j 979e222e68 first commit 2024-10-23 17:52:59 +08:00
..
bulk-loading-from-amazon-s3-using-copy first commit 2024-10-23 17:52:59 +08:00
copy-from-json first commit 2024-10-23 17:52:59 +08:00
copy-into first commit 2024-10-23 17:52:59 +08:00
grabit-snowfalke-5-sqlflow-result.png first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-1.png first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-2-database.png first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-3-database-parameters.png first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-4-sqlflow.png first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-6-data-lineage-result.png first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-command-line.md first commit 2024-10-23 17:52:59 +08:00
grabit-snowflake-gui.md first commit 2024-10-23 17:52:59 +08:00
readme.md first commit 2024-10-23 17:52:59 +08:00

readme.md

Snowflake column-level data lineage

Discover and visualization lineage from Snowflake database and script.

Extract DDL from the database

1、database:

show databases;

2、table, view

select
  '"' || t.table_catalog || '"' as dbName,
  '"' || t.table_schema || '"' as schemaName,
  '"' || t.table_name || '"' as tableName,
  case when t.table_type = 'VIEW' then 'true'
       when t.table_type = 'BASE TABLE' then 'false'
       else 'false'
  end as isView,
  '"' || c.column_name || '"' as columnName,
  c.data_type,
  null as comments
from
  "%s".information_schema.tables t,
  "%s".information_schema.columns c
where
  t.table_catalog = c.table_catalog
  and t.table_schema = c.table_schema
  and t.table_name = c.table_name
  and upper(t.table_schema) not in ('INFORMATION_SCHEMA')
order by t.table_catalog, t.table_schema, t.table_name, c.ordinal_position;

3、source code of the view

SHOW VIEWS IN %s.%s;
SELECT GET_DDL('VIEW', '%s.%s.%s');

4、source code of the procedure

SHOW PROCEDURES IN %s.%s;
SELECT GET_DDL('PROCEDURE', '%s.%s.%s');

5、source code of the function:

SHOW FUNCTIONS IN %s.%s;
SELECT GET_DDL('FUNCTION', '%s.%s.%s');

a minimum list of permissions need to extract all DDL

You must define a role that has access to the database of the DDL database you want to export and assign WAREHOUSE permissions to that role,If SQLFlow_role and SQLFlow_user are the roles and users you use when grabit connects to the Snowflake database, you need to do the following:

1, First, you need to create a role, such as SQLFlow_role

2, Next, you need to use the ACCOUNTADMIN role to assign the required database, schema, view, and table privileges to SQLFlow_role

3, Next, create the user to access SQLFlow_user

4, Finally, grant SQLFlow_role privileges to the SQLFlow_user

create or replace role SQLFlow_role;

use role accountadmin;

// Grant privileges to use and select from your target warehouses / dbs / schemas / tables
grant operate, usage on warehouse <your-warehouse> to role SQLFlow_role;
grant usage on DATABASE <your-database> to role SQLFlow_role;
grant usage on all schemas in database <your-database> to role SQLFlow_role; 
grant select on all tables in database <your-database> to role SQLFlow_role; 
grant select on all external tables in database <your-database> to role SQLFlow_role;
grant select on all views in database <your-database> to role SQLFlow_role;

// Grant privileges for all future schemas and tables created in a warehouse 
grant usage on future schemas in database "<your-database>" to role SQLFlow_role;
grant select on future tables in database "<your-database>" to role SQLFlow_role;

// Create a new SQLFlow_user user and assign the SQLFlow role to it 
create user SQLFlow_user display_name = 'SQLFlow' password='' default_role = SQLFlow_user default_warehouse = '<your-warehouse>';

// Grant the SQLFlow_role to the new SQLFlow_user user. 
grant role SQLFlow_role to user SQLFlow_user;

This represents the bare minimum privileges required to extract databases, schemas, views, tables from Snowflake.

enable extraction of table lineage

If you plan to enable extraction of table lineage, via the include_table_lineage config flag, you'll also need to grant privileges to access the Snowflake Account Usage views. You can execute the following using the ACCOUNTADMIN role to do so:

You must define a role that has access to the SNOWFLAKE database,And assign WAREHOUSE permission to this role.

grant privileges to a role, for example:

use role accountadmin;
grant imported privileges on database snowflake to role SQLFlow_role;

Using the grabit tool

  1. GUI Mode
  2. Command Line

Parameters used in grabit tool

hostname

The IP of the database server that the grabit connects.

port

The port number of the database server that the grabit connect.

username

The database user used to login to the database.

password

The password of the database user.

note: the passwords can be encrypted using tools [Encrypted password](#Encrypted password), using encrypted passwords more secure.

privateKeyFile

Use a private key to connect.

privateKeyFilePwd

Generate the password for the private key.

database

The name of the database instance to which it is connected, it is optional.

extractedDbsSchemas

List of databases and schemas to extract, separated by commas, which are to be provided in the format database/schema; Or blank to extract all databases. database1/schema1,database2/schema2,database3 or database1.schema1,database2.schema2,database3 When parameter database is filled in, this parameter is considered a schema. And support wildcard characters such as database1/*,*/schema,*/*.

for example:

extractedDbsSchemas: "MY/ADMIN"

excludedDbsSchemas

This parameters works under the resultset filtered by extractedDbsSchemas. List of databases and schemas to exclude from extraction, separated by commas database1/schema1,database2 or database1.schema1,database2 When parameter database is filled in, this parameter is considered a schema. And support wildcard characters such as database1/*,*/schema,*/*.

for example:

excludedDbsSchemas: "MY/*"

extractedStoredProcedures

A list of stored procedures under the specified database and schema to extract, separated by commas, which are to be provided in the format database.schema.procedureName or schema.procedureName; Or blank to extract all databases, support expression. database1.schema1.procedureName1,database2.schema2.procedureName2,database3.schema3,database4 or database1/schema1/procedureName1,database2/schema2

for example:

extractedStoredProcedures: "database.scott.vEmp*"

or

extractedStoredProcedures: "database.scott"

extractedViews

A list of stored views under the specified database and schema to extract, separated by commas, which are to be provided in the format database.schema.viewName or schema.viewName. Or blank to extract all databases, support expression. database1.schema1.procedureName1,database2.schema2.procedureName2,database3.schema3,database4 or database1/schema1/procedureName1,database2/schema2

for example:

extractedViews: "database.scott.vEmp*"

or

extractedViews: "database.scott"

enableQueryHistory

Fetch SQL queries from the query history if set to true default is false.

Extract from the query history

This is the SQL query used to get query from the snowflake query history,We can extract data from the last year.

SELECT
*
FROM
"SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE
dateadd('mins',
-%s,
current_timestamp()) <= start_time ORDER BY start_time;

permission needs to extract queries from query history

You must define a role that has access to the SNOWFLAKE database,And assign WAREHOUSE permission to this role. please ref to: a minimum list of permissions need to extract all DDL

queryHistoryBlockOfTimeInMinutes

When enableQueryHistory:true, the interval at which the SQL query was extracted in the query History,default is 30 minutes.

queryHistorySqlType

When enableQueryHistory:true, the DML type of SQL is extracted from the query History. When empty, all types are extracted, and when multiple types are specified, a comma separates them, such as SELECT,UPDATE,MERGE. Currently only the snowflake database supports this parameter,support types are SHOW,SELECT,INSERT,UPDATE,DELETE,MERGE,CREATE TABLE, CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION.

for example:

queryHistorySqlType: "SELECT,DELETE"

snowflakeDefaultRole

This value represents the role of the snowflake database.

sqlsourceTableName

table name: query_table

query_name query_source
query1 create view v1 as select f1 from t1
query2 create view v2 as select f2 from t2
query3 create view v3 as select f3 from t3

If you save SQL queries in a specific table, one SQL query per row.

Let's say: query_table.query_source store the source code of the query. We can use this query to fetch all SQL queries in this table:

select query_name as queryName, query_source as querySource from query_table

By setting the value of sqlsourceTableName and sqlsourceColumnQuerySource,sqlsourceColumnQueryName grabit can fetch all SQL queries in this table and send it to the SQLFlow to analzye the lineage.

In this example,

"sqlsourceTableName":"query_table"
"sqlsourceColumnQuerySource":"query_source"
"sqlsourceColumnQueryName":"query_name"

Please leave sqlsource_table_name empty if you don't fetch SQL queries from a specific table.

sqlsourceColumnQuerySource

In the above sample:

"sqlsourceColumnQuerySource":"query_source"

sqlsourceColumnQueryName

"sqlsourceColumnQueryName":"query_name"

This parameter is optional, you don't need to speicify a query name column if it doesn't exist in the table.

eg configuration file:

{
  "databaseServer":{
    "hostname":"127.0.0.1",
    "port":"433",
    "username":"USERNAME",
    "password":"PASSWORD",
    "privateKeyFile":"",
    "privateKeyFilePwd":"",
    "database":"",
    "extractedDbsSchemas":"MY/dbo",
    "excludedDbsSchemas":"",
    "extractedStoredProcedures":"",
    "extractedViews":"",
    "enableQueryHistory":true,
    "queryHistoryBlockOfTimeInMinutes":30,
    "snowflakeDefaultRole":"",
    "queryHistorySqlType":"",
    "sqlsourceTableName":"",
    "sqlsourceColumnQuerySource":"",
    "sqlsourceColumnQueryName":""
  },
  "SQLFlowServer":{
    "server":"http:127.0.0.1",
    "serverPort":"8081",
    "userId":"gudu|0123456789",
    "userSecret":""
  },
  "SQLScriptSource":"database",
  "lineageReturnFormat":"json",
  "databaseType":"snowflake"
}