first commit

This commit is contained in:
j 2024-10-23 17:52:59 +08:00
commit 979e222e68
380 changed files with 40774 additions and 0 deletions

26
.gitignore vendored Normal file
View File

@ -0,0 +1,26 @@
# Compiled class file
*.class
# Log file
*.log
# BlueJ files
*.ctxt
# Mobile Tools for Java (J2ME)
.mtj.tmp/
# Package Files #
*.jar
*.war
*.nar
*.ear
*.zip
*.tar.gz
*.rar
.DS_Store
# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

201
LICENSE Normal file
View File

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

52
README.md Normal file
View File

@ -0,0 +1,52 @@
## [SQLFlow](https://sqlflow.gudusoft.com) - A tool that tracks column-level data lineage
Track Column-Level Data Lineage for [more than 20 major databases](/databases/readme.md) including
Snowflake, Hive, SparkSQL, Teradata, Oracle, SQL Server, AWS redshift, BigQuery, etc.
Build and visualize lineage from SQL script from query history, ETL script,
Github/Bitbucket, Local filesystem and remote databases.
[Exploring lineage using an interactive diagram](https://sqlflow.gudusoft.com) or programmatically using [Restful APIs](/api) or [SDKs](https://www.gudusoft.com/sqlflow-java-library-2/).
Discover data lineage in this query:
```sql
insert into emp (id,first_name,last_name,city,postal_code,ph)
select a.id,a.first_name,a.last_name,a.city,a.postal_code,b.ph
from emp_addr a
inner join emp_ph b on a.id = b.id;
```
SQLFlow presents a nice clean graph to you that tells
where the data came from, what transformations it underwent along the way,
and what other data items are derived from this data value.
[![SQLFlow Introduce](images/sqlflow_introduce1.png)](https://sqlflow.gudusoft.com)
### What SQLFlow can do for you
- Scan your database and discover the data lineage instantly.
- Automatically collect SQL script from github/bitbucket or local file system.
- Provide a nice cleam diagram to the end-user to understand the data lineage quickly.
- programmatically using [Restful APIs](/api) or [SDKs](https://www.gudusoft.com/sqlflow-java-library-2/) to get lineage in CSV, JSON, Graphml format.
- Incorporate the lineage metadata decoded from the complex SQL script into your own metadata database for further processing.
- Visualize the metadata already existing in your database to release the power of data.
- Perform impact analysis and root-cause analysis by tracing lineage backwards or forwards with several mouse click.
- Able to process SQL script from more than 20 major database vendors.
### How to use SQLFlow
- Open [the official website](https://gudusoft.com/sqlflow/#/) of the SQLFlow and paste your SQL script or metadata to get a nice clean lineage diagram.
- Call the [Restful API](/api) of the SQLFlow in your own code to get data lineage metadata decoded by the SQLFlow from the SQL script.
- The [on-premise version](https://github.com/sqlparser/sqlflow_public/blob/master/install_sqlflow.md) of SQLflow enables you to use it on your own server to keep the data safer.
### Restful APIs
- [SQLFlow API document](https://github.com/sqlparser/sqlflow_public/blob/master/api/sqlflow_api.md)
- [Client in C#](https://github.com/sqlparser/sqlflow_public/tree/master/api/client/csharp)
### SQLFlow architecture
- [Architecture document](sqlflow_architecture.md)
### User manual and FAQ
- [User guide](sqlflow_guide.md)
- [SQLFlow FAQ](sqlflow_faq.md)

50
README_CN.md Normal file
View File

@ -0,0 +1,50 @@
## 一、SQLFlow 是什么
数据库中视图(View)的数据来自表(Table)或其他视图,视图中字段(Column)的数据可能来自多个表中多个字段的聚集(aggregation)。
表中的数据可能通过ETL从外部系统中导入。这种从数据的源头经过各个处理环节到达数据终点的数据链路关系称为数据血缘关系([data lineage](https://en.wikipedia.org/wiki/Data_lineage))。
[SQLFlow](https://sqlflow.gudusoft.com/) 通过分析各种数据库对象的定义(DDL)、DML 语句、ETL/ELT中使用的存储过程(Proceudre,Function)、
触发器(Trigger)和其他 SQL 脚本,给出完整的数据血缘关系。
在大型数据仓库中,完整的数据血缘关系可以用来进行数据溯源、表和字段变更的影响分析、数据合规性的证明、数据质量的检查等。
举例来说,可能会问财务报表中的统计结果,它是有哪些子系统(采购、生产、销售等)提供的数据汇总而成的?
当某个子系统(例如 销售子系统)的表和字段等数据结构发生变化时,可能会影响其它子系统吗?
财务报表子系统中的表和字段是否也需要进行相应的改动?
SQLFlow 会帮助你回答这些问题以可视化的图形方式把这些关系呈现在你面前让你对组织的IT系统中的数据流动一目了然。
![SQLFlow Introduce](images/sqlflow_introduce1.png)
## 二、SQLFlow 是怎样工作的
1. 从数据库、版本控制系统、文件系统中获取 SQL 脚本。
2. 解析 SQL 脚本,分析其中的各种数据库对象关系,建立数据血缘关系。
3. 以各种形式呈现数据血缘关系,包括交互式 UI、CSV、JSON、GRAPHML 格式。
## 三、SQLFlow 的组成
1. Backend 后台由一系列 Java 程序组成。负责 SQL 的解析、数据血缘分析、可视化元素的布局、身份认证等。
2. Frontend前端由一系列 javascript、html 代码组成。负责 SQL 的递交、数据血缘关系的可视化展示。
3. [Grabit 工具](https://www.gudusoft.com/grabit/),一个 Java 程序。负责从数据库、版本控制系统、文件系统中收集 SQL 脚本,递交给后台进行数据血缘分析。
4. [Restful API](https://github.com/sqlparser/sqlflow_public/tree/master/api),一套完整的 API。让用户可以通过 Java、C#、Python、PHP 等编程语言与后台进行交互,完成数据血缘分析。
![SQLFlow Components](https://github.com/sqlparser/sqlflow_public/raw/master/sqlflow_components.png)
## 四、SQLFlow的使用
1. 通过浏览器访问[SQLFlow的前端](https://sqlflow.gudusoft.com/)。
2. 在浏览器中上传SQL文本或文件。
3. 点击分析按钮后,查看数据血缘关系的可视化结果。
4. 在浏览器中,以交互形式,查看特定表或视图的完整血缘关系图。
5. 用 grabit 工具或 API提交需要处理的 SQL 文件,然后在浏览器中查看结果,或在自己的代码中对返回的结果做进一步处理。
## 五、SQLFlow 的局限
SQLFlow 仅仅通过分析 SQL 脚本,包含存储过程(proceudre, function, trigger)来获取数据库中的数据血缘关系。
但在 ETL 数据转换过程中,会用到很多其它技术和工具,由此产生的数据血缘关系目前 SQLFlow 无法探知。
## 六、进一步了解 SQLFlow
1. 支持多达21个主流数据库
2. [Architecture document](sqlflow_architecture.md)

344
api/csharp/.gitignore vendored Normal file
View File

@ -0,0 +1,344 @@
## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.
##
## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore
# User-specific files
*.rsuser
*.suo
*.user
*.userosscache
*.sln.docstates
# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs
# Build results
[Dd]ebug/
[Dd]ebugPublic/
[Rr]elease/
[Rr]eleases/
x64/
x86/
[Aa][Rr][Mm]/
[Aa][Rr][Mm]64/
bld/
[Bb]in/
[Oo]bj/
[Ll]og/
# Visual Studio 2015/2017 cache/options directory
.vs/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/
# Visual Studio 2017 auto generated files
Generated\ Files/
# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*
# NUNIT
*.VisualState.xml
TestResult.xml
# Build Results of an ATL Project
[Dd]ebugPS/
[Rr]eleasePS/
dlldata.c
# Benchmark Results
BenchmarkDotNet.Artifacts/
# .NET Core
project.lock.json
project.fragment.lock.json
artifacts/
# StyleCop
StyleCopReport.xml
# Files built by Visual Studio
*_i.c
*_p.c
*_h.h
*.ilk
*.meta
*.obj
*.iobj
*.pch
*.pdb
*.ipdb
*.pgc
*.pgd
*.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*_wpftmp.csproj
*.log
*.vspscc
*.vssscc
.builds
*.pidb
*.svclog
*.scc
# Chutzpah Test files
_Chutzpah*
# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opendb
*.opensdf
*.sdf
*.cachefile
*.VC.db
*.VC.VC.opendb
# Visual Studio profiler
*.psess
*.vsp
*.vspx
*.sap
# Visual Studio Trace Files
*.e2e
# TFS 2012 Local Workspace
$tf/
# Guidance Automation Toolkit
*.gpState
# ReSharper is a .NET coding add-in
_ReSharper*/
*.[Rr]e[Ss]harper
*.DotSettings.user
# JustCode is a .NET coding add-in
.JustCode
# TeamCity is a build add-in
_TeamCity*
# DotCover is a Code Coverage Tool
*.dotCover
# AxoCover is a Code Coverage Tool
.axoCover/*
!.axoCover/settings.json
# Visual Studio code coverage results
*.coverage
*.coveragexml
# NCrunch
_NCrunch_*
.*crunch*.local.xml
nCrunchTemp_*
# MightyMoose
*.mm.*
AutoTest.Net/
# Web workbench (sass)
.sass-cache/
# Installshield output folder
[Ee]xpress/
# DocProject is a documentation generator add-in
DocProject/buildhelp/
DocProject/Help/*.HxT
DocProject/Help/*.HxC
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/Html2
DocProject/Help/html
# Click-Once directory
publish/
# Publish Web Output
*.[Pp]ublish.xml
*.azurePubxml
# Note: Comment the next line if you want to checkin your web deploy settings,
# but database connection strings (with potential passwords) will be unencrypted
*.pubxml
*.publishproj
!linux.pubxml
!osx.pubxml
!win.pubxml
# Microsoft Azure Web App publish settings. Comment the next line if you want to
# checkin your Azure Web App publish settings, but sensitive information contained
# in these scripts will be unencrypted
PublishScripts/
# NuGet Packages
*.nupkg
# The packages folder can be ignored because of Package Restore
**/[Pp]ackages/*
# except build/, which is used as an MSBuild target.
!**/[Pp]ackages/build/
# Uncomment if necessary however generally it will be regenerated when needed
#!**/[Pp]ackages/repositories.config
# NuGet v3's project.json files produces more ignorable files
*.nuget.props
*.nuget.targets
# Microsoft Azure Build Output
csx/
*.build.csdef
# Microsoft Azure Emulator
ecf/
rcf/
# Windows Store app package directories and files
AppPackages/
BundleArtifacts/
Package.StoreAssociation.xml
_pkginfo.txt
*.appx
# Visual Studio cache files
# files ending in .cache can be ignored
*.[Cc]ache
# but keep track of directories ending in .cache
!?*.[Cc]ache/
# Others
ClientBin/
~$*
*~
*.dbmdl
*.dbproj.schemaview
*.jfm
*.pfx
*.publishsettings
orleans.codegen.cs
# Including strong name files can present a security risk
# (https://github.com/github/gitignore/pull/2483#issue-259490424)
#*.snk
# Since there are multiple workflows, uncomment next line to ignore bower_components
# (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
#bower_components/
# ASP.NET Core default setup: bower directory is configured as wwwroot/lib/ and bower restore is true
**/wwwroot/lib/
# RIA/Silverlight projects
Generated_Code/
# Backup & report files from converting an old project file
# to a newer Visual Studio version. Backup files are not needed,
# because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm
ServiceFabricBackup/
*.rptproj.bak
# SQL Server files
*.mdf
*.ldf
*.ndf
# Business Intelligence projects
*.rdl.data
*.bim.layout
*.bim_*.settings
*.rptproj.rsuser
# Microsoft Fakes
FakesAssemblies/
# GhostDoc plugin setting file
*.GhostDoc.xml
# Node.js Tools for Visual Studio
.ntvs_analysis.dat
node_modules/
# Visual Studio 6 build log
*.plg
# Visual Studio 6 workspace options file
*.opt
# Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
*.vbw
# Visual Studio LightSwitch build output
**/*.HTMLClient/GeneratedArtifacts
**/*.DesktopClient/GeneratedArtifacts
**/*.DesktopClient/ModelManifest.xml
**/*.Server/GeneratedArtifacts
**/*.Server/ModelManifest.xml
_Pvt_Extensions
# Paket dependency manager
.paket/paket.exe
paket-files/
# FAKE - F# Make
.fake/
# JetBrains Rider
.idea/
*.sln.iml
# CodeRush personal settings
.cr/personal
# Python Tools for Visual Studio (PTVS)
__pycache__/
*.pyc
# Cake - Uncomment if you are using it
# tools/**
# !tools/packages.config
# Tabs Studio
*.tss
# Telerik's JustMock configuration file
*.jmconfig
# BizTalk build output
*.btp.cs
*.btm.cs
*.odx.cs
*.xsd.cs
# OpenCover UI analysis results
OpenCover/
# Azure Stream Analytics local run output
ASALocalRun/
# MSBuild Binary and Structured Log
*.binlog
# NVidia Nsight GPU debugger configuration file
*.nvuser
# MFractors (Xamarin productivity tool) working folder
.mfractor/
# Local History for Visual Studio
.localhistory/
# BeatPulse healthcheck temp database
healthchecksdb

View File

@ -0,0 +1,25 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 16
VisualStudioVersion = 16.0.29503.13
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SQLFlowClient", "SQLFlowClient\SQLFlowClient.csproj", "{8F80B6E9-F33B-4936-8111-48A9BCA9AEDC}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{8F80B6E9-F33B-4936-8111-48A9BCA9AEDC}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{8F80B6E9-F33B-4936-8111-48A9BCA9AEDC}.Debug|Any CPU.Build.0 = Debug|Any CPU
{8F80B6E9-F33B-4936-8111-48A9BCA9AEDC}.Release|Any CPU.ActiveCfg = Release|Any CPU
{8F80B6E9-F33B-4936-8111-48A9BCA9AEDC}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {3C9B0DCD-4A60-4E0C-9A35-7211C074B0D1}
EndGlobalSection
EndGlobal

1
api/csharp/SQLFlowClient/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
dist

View File

@ -0,0 +1,14 @@
using System;
using System.Collections.Generic;
using System.Text;
namespace SQLFlowClient
{
class Config
{
public string Host { get; set; }
public string Token { get; set; }
public string SecretKey { get; set; }
public string UserId { get; set; }
}
}

View File

@ -0,0 +1,31 @@
using System;
using System.Collections.Generic;
using System.Text;
using System.ComponentModel;
namespace SQLFlowClient
{
public enum DBVendor
{
bigquery,
couchbase,
db2,
greenplum,
hana ,
hive,
impala ,
informix,
mdx,
mysql,
netezza,
openedge,
oracle,
postgresql,
redshift,
snowflake,
mssql,
sybase,
teradata,
vertica,
}
}

View File

@ -0,0 +1,246 @@
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.Linq;
using System.Net.Http.Headers;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System.IO;
using System.Diagnostics;
namespace SQLFlowClient
{
public static class HttpService
{
private static Config config;
public static async Task Request(Options options)
{
config = new Config
{
Host = "https://api.gudusoft.com",
Token = "",
UserId = "gudu|0123456789",
};
try
{
if (File.Exists("./config.json"))
{
var json = JObject.Parse(File.ReadAllText("./config.json"));
if (!string.IsNullOrWhiteSpace(json["Host"]?.ToString()))
{
config.Host = json["Host"].ToString();
}
if (!string.IsNullOrWhiteSpace(json["Token"]?.ToString()))
{
config.Token = json["Token"].ToString();
}
if (!string.IsNullOrWhiteSpace(json["SecretKey"]?.ToString()))
{
config.SecretKey = json["SecretKey"].ToString();
}
if (!string.IsNullOrWhiteSpace(json["UserId"]?.ToString()))
{
config.UserId = json["UserId"].ToString();
}
}
}
catch (Exception e)
{
Console.WriteLine($"Invalid config.json :\n{e.Message}");
return;
}
//if (!string.IsNullOrWhiteSpace(options.Token))
//{
// config.Token = options.Token;
//}
//if (!string.IsNullOrWhiteSpace(options.UserId))
//{
// config.UserId = options.UserId;
//}
//if (!string.IsNullOrWhiteSpace(options.SecretKey))
//{
// config.SecretKey = options.SecretKey;
//}
if (options.Version)
{
await Version();
}
else
{
await SQLFlow(options);
}
}
public static async Task SQLFlow(Options options)
{
StreamContent sqlfile;
if (options.SQLFile == null)
{
Console.WriteLine($"Please specify an input file. (e.g. SQLFlowClient test.sql)");
return;
}
try
{
string path = Path.GetFullPath(options.SQLFile);
sqlfile = new StreamContent(File.Open(options.SQLFile, FileMode.Open));
}
catch (Exception e)
{
Console.WriteLine($"Open file failed.\n{e.Message}");
return;
}
var types = options.ShowRelationType.Split(",")
.Where(p => Enum.GetNames(typeof(RelationType)).FirstOrDefault(t => t.ToLower() == p.ToLower()) == null)
.ToList();
if (types.Count != 0)
{
Console.WriteLine($"Wrong relation type : { string.Join(",", types) }.\nIt should be one or more from the following list : fdd, fdr, frd, fddi, join");
return;
}
string dbvendor = Enum.GetNames(typeof(DBVendor)).FirstOrDefault(p => p.ToLower() == options.DBVendor.ToLower());
if (dbvendor == null)
{
Console.WriteLine($"Wrong database vendor : {options.DBVendor}.\nIt should be one of the following list : " +
$"bigquery, couchbase, db2, greenplum, hana , hive, impala , informix, mdx, mysql, netezza, openedge," +
$" oracle, postgresql, redshift, snowflake, mssql, sybase, teradata, vertica");
return;
}
if (!string.IsNullOrWhiteSpace(config.SecretKey) && !string.IsNullOrWhiteSpace(config.UserId))
{
// request token
string url2 = $"{config.Host}/gspLive_backend/user/generateToken";
using var client2 = new HttpClient();
using var response2 = await client2.PostAsync(url2, content: new FormUrlEncodedContent(new List<KeyValuePair<string, string>>
{
new KeyValuePair<string, string>("userId", config.UserId),
new KeyValuePair<string, string>("secretKey", config.SecretKey)
}));
if (response2.IsSuccessStatusCode)
{
var text = await response2.Content.ReadAsStringAsync();
var jobject = JObject.Parse(text);
var json = jobject.ToString();
var code = jobject.SelectToken("code");
if (code?.ToString() == "200")
{
config.Token = jobject.SelectToken("token").ToString();
}
else
{
Console.WriteLine($"{url2} error, code={code?.ToString() }");
return;
}
}
else
{
Console.WriteLine($"Wrong response code {(int)response2.StatusCode} {response2.StatusCode}.url={url2}");
return;
}
}
var form = new MultipartFormDataContent{
{ sqlfile , "sqlfile" , "sqlfile" },
{ new StringContent("dbv"+dbvendor) , "dbvendor" },
{ new StringContent(options.ShowRelationType) , "showRelationType" },
{ new StringContent(options.SimpleOutput.ToString()) , "simpleOutput" },
{ new StringContent(options.IgnoreRecordSet.ToString()) , "ignoreRecordSet" },
{ new StringContent(options.ignoreFunction.ToString()) , "ignoreFunction" },
{ new StringContent(config.UserId) , "userId" },
{ new StringContent(config.Token) , "token" },
};
try
{
var stopWatch = Stopwatch.StartNew();
string url = $"{config.Host}/gspLive_backend/sqlflow/generation/sqlflow/" + (options.IsGraph ? "graph" : "");
using var client = new HttpClient();
// client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Token", config.Token);
using var response = await client.PostAsync(url, form);
if (response.IsSuccessStatusCode)
{
stopWatch.Stop();
var text = await response.Content.ReadAsStringAsync();
var result = new SQLFlowResult(text);
if (result.data && result.dbobjs || result.data && result.sqlflow && result.graph)
{
if (options.Output != "")
{
try
{
File.WriteAllText(Path.GetFullPath(options.Output), result.json);
Console.WriteLine($"Output has been saved to {options.Output}.");
}
catch (Exception e)
{
Console.WriteLine($"Save File failed.{e.Message}");
}
}
else
{
Console.WriteLine(result.json ?? "");
}
}
if (result.error)
{
Console.WriteLine($"Success with some errors.Executed in {stopWatch.Elapsed.TotalSeconds.ToString("0.00")} seconds by host {config.Host}.");
}
else
{
Console.WriteLine($"Success.Executed in {stopWatch.Elapsed.TotalSeconds.ToString("0.00")} seconds by host {config.Host}.");
}
}
else
{
Console.WriteLine($"Wrong response code {(int)response.StatusCode} {response.StatusCode}.");
}
}
catch (Exception e)
{
Console.WriteLine($"An unknonwn exeception occurs :\n{e.Message}");
}
}
public static async Task Version()
{
try
{
string url = $"{config.Host}/gspLive_backend/version";
using var client = new HttpClient();
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Token", config.Token);
var form = new MultipartFormDataContent{
{ new StringContent(config.UserId) , "userId" },
};
using var response = await client.PostAsync(url, form);
if (response.IsSuccessStatusCode)
{
var text = await response.Content.ReadAsStringAsync();
var json = JObject.Parse(text);
var gsp = new
{
ReleaseDate = json.SelectToken("version.gsp.['release.date']")?.ToString(),
version = json.SelectToken("version.gsp.version")?.ToString(),
};
var backend = new
{
ReleaseDate = json.SelectToken("version.backend.['release.date']")?.ToString(),
version = json.SelectToken("version.backend.version")?.ToString(),
};
Console.WriteLine(" version relase date");
Console.WriteLine("SQLFlowClient 1.2.0 2020/12/13");
Console.WriteLine($"gsp {gsp.version} {gsp.ReleaseDate}");
Console.WriteLine($"backend {backend.version} {backend.ReleaseDate}");
}
else
{
Console.WriteLine($"Not connected.Wrong response code {(int)response.StatusCode} {response.StatusCode}.");
}
}
catch (Exception e)
{
Console.WriteLine($"An unknonwn exeception occurs :\n{e.Message}");
}
}
}
}

View File

@ -0,0 +1,73 @@
using System;
using CommandLine;
using CommandLine.Text;
namespace SQLFlowClient
{
public class Options
{
[Value(0, MetaName = "sqlfile", Required = false, HelpText = "Input sqlfile to be processed.")]
public string SQLFile { get; set; }
[Option('g', "graph", Required = false, Default = false, HelpText = "Get the graph from sql.")]
public bool IsGraph { get; set; }
[Option('v', "dbvendor", Required = false, Default = "oracle", HelpText = "Set the database of the sqlfile.")]
public string DBVendor { get; set; }
[Option('r', "showRelationType", Required = false, Default = "fdd", HelpText = "Set the relation type.")]
public string ShowRelationType { get; set; }
[Option('s', "simpleOutput", Required = false, Default = false, HelpText = "Set whether to get simple output.")]
public bool SimpleOutput { get; set; }
[Option("ignoreRecordSet", Required = false, Default = false, HelpText = "Set whether to ignore record set.")]
public bool IgnoreRecordSet { get; set; }
[Option("ignoreFunction", Required = false, Default = false, HelpText = "Set whether to ignore function.")]
public bool ignoreFunction { get; set; }
[Option('o', "output", Required = false, Default = "", HelpText = "Save output as a file.")]
public string Output { get; set; }
//[Option('t', "token", Required = false, Default = "", HelpText = "If userId and secretKey is given, token will be ignore, otherwise it will use token.")]
//public string Token { get; set; }
//[Option('u', "userId", Required = false, Default = "", HelpText = "")]
//public string UserId { get; set; }
//[Option('k', "secretKey", Required = false, Default = "", HelpText = "")]
//public string SecretKey { get; set; }
[Option("version", Required = false, Default = false, HelpText = "Show version.")]
public bool Version { get; set; }
}
class Program
{
static void Main(string[] args)
{
var parser = new Parser(with =>
{
with.AutoVersion = false;
with.AutoHelp = true;
});
var parserResult = parser.ParseArguments<Options>(args); ;
parserResult
.WithParsed(options =>
{
HttpService.Request(options).Wait();
})
.WithNotParsed(errs =>
{
var helpText = HelpText.AutoBuild(parserResult, h =>
{
h.AutoHelp = true;
h.AutoVersion = false;
return h;
}, e => e);
Console.WriteLine(helpText);
});
}
}
}

View File

@ -0,0 +1,17 @@
<?xml version="1.0" encoding="utf-8"?>
<!--
https://go.microsoft.com/fwlink/?LinkID=208121.
-->
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration>Release</Configuration>
<Platform>Any CPU</Platform>
<PublishDir>bin\Release\netcoreapp3.0\publish\linux</PublishDir>
<PublishProtocol>FileSystem</PublishProtocol>
<TargetFramework>netcoreapp3.0</TargetFramework>
<RuntimeIdentifier>linux-x64</RuntimeIdentifier>
<SelfContained>true</SelfContained>
<PublishSingleFile>True</PublishSingleFile>
<PublishTrimmed>False</PublishTrimmed>
</PropertyGroup>
</Project>

View File

@ -0,0 +1,17 @@
<?xml version="1.0" encoding="utf-8"?>
<!--
https://go.microsoft.com/fwlink/?LinkID=208121.
-->
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration>Release</Configuration>
<Platform>Any CPU</Platform>
<PublishDir>bin\Release\netcoreapp3.0\publish\osx\</PublishDir>
<PublishProtocol>FileSystem</PublishProtocol>
<TargetFramework>netcoreapp3.0</TargetFramework>
<RuntimeIdentifier>osx-x64</RuntimeIdentifier>
<SelfContained>true</SelfContained>
<PublishSingleFile>True</PublishSingleFile>
<PublishTrimmed>False</PublishTrimmed>
</PropertyGroup>
</Project>

View File

@ -0,0 +1,18 @@
<?xml version="1.0" encoding="utf-8"?>
<!--
https://go.microsoft.com/fwlink/?LinkID=208121.
-->
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration>Release</Configuration>
<Platform>Any CPU</Platform>
<PublishDir>bin\Release\netcoreapp3.0\publish\win\</PublishDir>
<PublishProtocol>FileSystem</PublishProtocol>
<TargetFramework>netcoreapp3.0</TargetFramework>
<RuntimeIdentifier>win-x64</RuntimeIdentifier>
<SelfContained>true</SelfContained>
<PublishSingleFile>True</PublishSingleFile>
<PublishReadyToRun>False</PublishReadyToRun>
<PublishTrimmed>False</PublishTrimmed>
</PropertyGroup>
</Project>

View File

@ -0,0 +1,8 @@
{
"profiles": {
"SQLFlowClient": {
"commandName": "Project",
"commandLineArgs": "test.sql"
}
}
}

View File

@ -0,0 +1,15 @@
using System;
using System.Collections.Generic;
using System.Text;
namespace SQLFlowClient
{
public enum RelationType
{
fdd,
fdr,
frd,
fddi,
join,
}
}

View File

@ -0,0 +1,30 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp3.0</TargetFramework>
<Version>1.0.9</Version>
<AssemblyVersion>1.0.9.0</AssemblyVersion>
<FileVersion>1.0.9.0</FileVersion>
</PropertyGroup>
<ItemGroup>
<Compile Remove="dist\**" />
<EmbeddedResource Remove="dist\**" />
<None Remove="dist\**" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="CommandLineParser" Version="2.6.0" />
<PackageReference Include="Newtonsoft.Json" Version="12.0.3" />
</ItemGroup>
<ItemGroup>
<None Update="config.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="test.sql">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>
</Project>

View File

@ -0,0 +1,86 @@
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
namespace SQLFlowClient
{
class SQLFlowResult
{
private readonly int maxLength = 24814437;// json will not be formatted if the string length exceeds this number
public string json;
public bool data;
public bool error;
public bool dbobjs;
public bool sqlflow;
public bool graph;
public SQLFlowResult(string text)
{
if (text.Length <= maxLength)
{
var jobject = JObject.Parse(text);
json = jobject.ToString();
data = jobject.SelectToken("data") != null;
error = jobject.SelectToken("error") != null;
dbobjs = jobject.SelectToken("data.dbobjs") != null;
sqlflow = jobject.SelectToken("data.sqlflow") != null;
graph = jobject.SelectToken("data.graph") != null;
}
else
{
json = text;
data = false;
error = false;
dbobjs = false;
sqlflow = false;
graph = false;
using var reader = new JsonTextReader(new StringReader(text));
while (reader.Read())
{
if (reader.Value != null)
{
//Console.WriteLine("Token: {0}, Value: {1} ,Depth{2}", reader.TokenType, reader.Value, reader.Depth);
if (reader.Depth > 3)
{
goto End;
}
if (reader.TokenType.ToString() == "PropertyName")
{
switch (reader.Value.ToString())
{
case "data":
data = true;
break;
case "error":
error = true;
break;
case "dbobjs":
dbobjs = true;
break;
case "sqlflow":
sqlflow = true;
break;
case "graph":
graph = true;
break;
}
}
}
else
{
//Console.WriteLine("Token: {0}", reader.TokenType);
if (error || dbobjs || sqlflow || graph)
{
reader.Skip();
}
}
}
End: { }
}
}
}
}

View File

@ -0,0 +1,5 @@
{
"Host": "https://api.gudusoft.com",
"SecretKey": "d126d0fb1a5a13abb97b160d571f29a2bbaa13861219082da7e9c4d62553ed7c",
"UserId": "auth0|600acd55e68a290069f8a8db"
}

View File

@ -0,0 +1,5 @@
dotnet publish -c Release /p:PublishProfile=Properties\PublishProfiles\linux.pubxml
dotnet publish -c Release /p:PublishProfile=Properties\PublishProfiles\osx.pubxml
dotnet publish -c Release /p:PublishProfile=Properties\PublishProfiles\win.pubxml
if exist dist rd dist /S /Q
xcopy /s .\bin\Release\netcoreapp3.0\publish .\dist\

View File

@ -0,0 +1,33 @@
CREATE VIEW vsal
AS
SELECT a.deptno "Department",
a.num_emp / b.total_count "Employees",
a.sal_sum / b.total_sal "Salary"
FROM (SELECT deptno,
Count() num_emp,
SUM(sal) sal_sum
FROM scott.emp
WHERE city = 'NYC'
GROUP BY deptno) a,
(SELECT Count() total_count,
SUM(sal) total_sal
FROM scott.emp
WHERE city = 'NYC') b
;
INSERT ALL
WHEN ottl < 100000 THEN
INTO small_orders
VALUES(oid, ottl, sid, cid)
WHEN ottl > 100000 and ottl < 200000 THEN
INTO medium_orders
VALUES(oid, ottl, sid, cid)
WHEN ottl > 200000 THEN
into large_orders
VALUES(oid, ottl, sid, cid)
WHEN ottl > 290000 THEN
INTO special_orders
SELECT o.order_id oid, o.customer_id cid, o.order_total ottl,
o.sales_rep_id sid, c.credit_limit cl, c.cust_email cem
FROM orders o, customers c
WHERE o.customer_id = c.customer_id;

115
api/csharp/readme.md Normal file
View File

@ -0,0 +1,115 @@
# Get Started
### [Download](https://sqlflow.gudusoft.com/download/) the executable program according to your operating system.
- [windows](https://sqlflow.gudusoft.com/download/win/SQLFlowClient.exe)
- [mac](https://sqlflow.gudusoft.com/download/osx/SQLFlowClient)
- [linux](https://sqlflow.gudusoft.com/download/linux/SQLFlowClient)
### Configuration
#### SQLFlow Cloud server
Create a file named `config.json` in directory where the executable(.exe) exists, and then input your `SecretKey` and `UserId`, always set `host` to `https://api.gudusoft.com` ,for example:
```json
{
"Host": "https://api.gudusoft.com",
"SecretKey": "XXX",
"UserId": "XXX"
}
```
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
#### SQLFlow on-premise version
Create a file named `config.json` in directory where the executable(.exe) exists, and always set `userId` to `gudu|0123456789`, keep `userSecret` empty, and set `host`to your server ip, for example:
```json
{
"Host": "http://your server ip:8081",
"SecretKey": "",
"UserId": "gudu|0123456789"
}
```
Please [check here](https://github.com/sqlparser/sqlflow_public/blob/master/install_sqlflow.md) to see how to install SQLFlow on-premise version on you own server.
### Set permissions
For mac:
```
chmod +x SQLFlowClient
```
For linux:
```
chmod +x SQLFlowClient
```
### Create a simple sql file for testing
For example, test.sql:
```sql
insert into t2 select * from t1;
```
Run the program from command line:
```
./SQLFlowClient test.sql
```
```
./SQLFlowClient test.sql -g
```
# Usage
SQLFlowClient filepath -parameter value
### parameters
| parameter | short | value type | default | |
| ------------------ | ----- | ------------------------------------------------------------ | ------- | --------------------------------- |
| --graph | -g | boolean | false | Get the graph from sql. |
| --dbvendor | -v | one of the following list :<br />bigquery, couchbase, db2, greenplum, <br />hana , hive, impala , informix, <br />mdx, mysql, netezza, openedge, <br />oracle, postgresql, redshift, snowflake, <br />mssql, sybase, teradata, vertica | oracle | Set the database of the sqlfile. |
| --showRelationType | -r | one or more from the following list :<br /> fdd, fdr, frd, fddi, join | fdd | Set the relation type. |
| --simpleOutput | -s | boolean | false | Set whether to get simple output. |
| --ignoreRecordSet | | boolean | false | Set whether to ignore record set. |
| --ignoreFunction | | boolean | false | Set whether to ignore function. |
| --output | -o | string | "" | Save output as a file. |
| --help | | | | Display this help screen. |
| --version | | | | Display version information. |
### examples
1. SQLFlowClient test.sql
2. SQLFlowClient test.sql -g
3. SQLFlowClient test.sql -g -v oracle
4. SQLFlowClient test.sql -g -v oracle -r fdr
5. SQLFlowClient test.sql -g -v oracle -r fdr,join
6. SQLFlowClient test.sql -g -v oracle -r fdr,join -s
7. SQLFlowClient test.sql -g -v oracle -r fdr,join -s --ignoreRecordSet
8. SQLFlowClient test.sql -g -v oracle -r fdr,join -s --ignoreFunction -o result.txt
# Compile and build on windows
### Download and install the .NET Core SDK
```
https://dotnet.microsoft.com/download
```
### Download source code
```
git clone https://github.com/sqlparser/sqlflow_public.git
```
### Build from command line
```
dotnet publish -c Release /p:PublishProfile=Properties\PublishProfiles\linux.pubxml
dotnet publish -c Release /p:PublishProfile=Properties\PublishProfiles\osx.pubxml
dotnet publish -c Release /p:PublishProfile=Properties\PublishProfiles\win.pubxml
```
### [Download executable programs](https://sqlflow.gudusoft.com/download//)

88
api/firstdemo.py Normal file
View File

@ -0,0 +1,88 @@
"""
How to get user_id and secret_key: https://docs.gudusoft.com/3.-api-docs/prerequisites#generate-account-secret
once you have user_id and secret_key,
user_id: <YOUR USER ID HERE>
secret_key: <YOUR SECRET KEY HERE>
you can get token by:
curl -X POST "https://api.gudusoft.com/gspLive_backend/user/generateToken" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:application/x-www-form-urlencoded;charset=UTF-8" -d "secretKey=YOUR SECRET KEY" -d "userId=YOUR USER ID HERE"
and then you can use the token to call the api:
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow?showRelationType=fdd" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "sqlfile=" -F "dbvendor=dbvoracle" -F "ignoreRecordSet=false" -F "simpleOutput=false" -F "sqltext=CREATE VIEW vsal as select * from emp" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE"
"""
# Python code to call the API based on the description:
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# Function to get the token
def get_token(user_id, secret_key):
url = "https://api.gudusoft.com/gspLive_backend/user/generateToken"
headers = {
"Request-Origion": "testClientDemo",
"accept": "application/json;charset=utf-8",
"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"
}
data = {
"secretKey": secret_key,
"userId": user_id
}
response = requests.post(url, headers=headers, data=data, verify=False, proxies=None)
# Check if the request was successful
response.raise_for_status()
# Parse the JSON response
json_response = response.json()
# Check if 'token' key exists directly in the response
if 'token' in json_response:
return json_response['token']
else:
raise ValueError("Token not found in the response.")
# Function to call the SQLFlow API
def call_sqlflow_api(user_id, token, sql_text):
url = "https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow?showRelationType=fdd"
headers = {
"Request-Origion": "testClientDemo",
"accept": "application/json;charset=utf-8"
}
data = {
"sqlfile": "",
"dbvendor": "dbvoracle",
"ignoreRecordSet": "false",
"simpleOutput": "false",
"sqltext": sql_text,
"userId": user_id,
"token": token
}
response = requests.post(url, headers=headers, data=data)
return response.json()
# Example usage
# How to get user_id and secret_key: https://docs.gudusoft.com/3.-api-docs/prerequisites#generate-account-secret
user_id = "your user id"
secret_key = "your secret key"
sql_text = "CREATE VIEW vsal AS SELECT * FROM emp"
try:
# Get the token
token = get_token(user_id, secret_key)
print("Token:", token)
except requests.exceptions.RequestException as e:
print("Error making request:", e)
except ValueError as e:
print("Error parsing response:", e)
# Call the SQLFlow API
result = call_sqlflow_api(user_id, token, sql_text)
print(result)

View File

@ -0,0 +1,90 @@
package java;
/**
* 解析SQLFLow exportLineageAsJson接口返回的JSON格式的血缘关系中的关系链路
*
* 例如demo中的血缘数据解析成以下链路
* 达成的目标是List中两个元素
* SCOTT.DEPT -> SCOTT.EMP->VSAL
* SCOTT.EMP->VSAL
*/
public class DataLineageParser {
static class Node {
String value;
String id;
Node next;
public Node(String value, String id) {
this.value = value;
this.id = id;
}
public String key() {
Node node = this.next;
StringBuilder key = new StringBuilder(id);
while (node != null) {
key.append(node.id);
node = node.next;
}
return key.toString();
}
}
public static void main(String[] args) {
String input = "{"jobId":"d9550e491c024d0cbe6e1034604aca17","code":200,"data":{"mode":"global","sqlflow":{"relationship":[{"sources":[{"parentName":"ORDERS","column":"TABLE","coordinates":[],"id":"10000106","parentId":"86"}],"id":"1000012311","type":"fdd","target":{"parentName":"SPECIAL_ORDERS","column":"TABLE","coordinates":[],"id":"10000102","parentId":"82"}},{"sources":[{"parentName":"CUSTOMERS","column":"TABLE","coordinates":[],"id":"10000103","parentId":"94"}],"id":"1000012312","type":"fdd","target":{"parentName":"SPECIAL_ORDERS","column":"TABLE","coordinates":[],"id":"10000102","parentId":"82"}}]}},"sessionId":"8bb7d3da4b687bb7badf01608a739fbebd61309cd5a643cecf079d122095738a_1685604216451"}";
try {
ObjectMapper objectMapper = new ObjectMapper();
JsonNode jsonNode = objectMapper.readTree(input);
JsonNode relationshipNode = jsonNode.path("data").path("sqlflow").path("relationships");
List<Map<String, Object>> dataList = objectMapper.readValue(relationshipNode.toString(), new TypeReference<List<Map<String, Object>>>() {
});
ArrayList<Node> value = new ArrayList<>();
Map<String, Node> nodeMap = new HashMap<>();
for (Map<String, Object> data : dataList) {
List<Map<String, Object>> sources = (List<Map<String, Object>>) data.get("sources");
Map<String, Object> targetNode = (Map<String, Object>) data.get("target");
Node target = new Node((String) targetNode.get("parentName"), (String) targetNode.get("parentId"));
if (!sources.isEmpty()) {
for (Map<String, Object> source : sources) {
String parentId = (String) source.get("parentId");
String parentName = (String) source.get("parentName");
Node sourceNode = new Node(parentName, parentId);
sourceNode.next = target;
value.add(sourceNode);
nodeMap.put(parentId, sourceNode);
}
} else {
value.add(target);
nodeMap.put((String) targetNode.get("parentId"), target);
}
}
for (Node node : value) {
Node next = node.next;
if (next != null) {
String id = next.id;
next = nodeMap.get(id);
if (next != null) {
node.next = next;
}
}
}
HashSet<String> key = new HashSet<>();
Iterator<Node> iterator = value.iterator();
while (iterator.hasNext()) {
Node node = iterator.next();
String k = node.key();
if (key.contains(k)) {
iterator.remove();
}
key.add(k);
}
// value
} catch (JsonProcessingException e) {
e.printStackTrace();
}
}
}

View File

@ -0,0 +1,3 @@
Manifest-Version: 1.0
Class-Path: lib/fastjson-1.2.47.jar lib/httpclient-4.5.5.jar lib/httpcore-4.4.9.jar lib/httpmime-4.5.6.jar lib/slf4j-api-1.7.25.jar lib/slf4j-log4j12-1.7.25.jar lib/commons-codec-1.10.jar lib/commons-logging-1.2.jar
Main-Class: com.gudusoft.grabit.Runner

3
api/java/MANIFEST.MF Normal file
View File

@ -0,0 +1,3 @@
Manifest-Version: 1.0
Class-Path: lib/fastjson-1.2.47.jar lib/httpclient-4.5.5.jar lib/httpcore-4.4.9.jar lib/httpmime-4.5.6.jar lib/slf4j-api-1.7.25.jar lib/slf4j-log4j12-1.7.25.jar lib/commons-codec-1.10.jar lib/commons-logging-1.2.jar
Main-Class: com.gudusoft.grabit.Runner

38
api/java/compile.bat Normal file
View File

@ -0,0 +1,38 @@
@ECHO OFF
SETLOCAL enableDelayedExpansion
SET cur_dir=%CD%
echo %cur_dir%
SET qddemo=%cur_dir%
SET qddemo_src=%qddemo%\src
SET qddemo_bin=%qddemo%\lib
SET qddemo_class=%qddemo%\class
echo %qddemo_class%
echo %qddemo_bin%
IF EXIST %qddemo_class% RMDIR %qddemo_class%
IF NOT EXIST %qddemo_class% MKDIR %qddemo_class%
cd %cur_dir%
CD %qddemo_src%
FOR /R %%b IN ( . ) DO (
IF EXIST %%b/*.java SET JFILES=!JFILES! %%b/*.java
)
MKDIR %qddemo_class%\lib
XCOPY %qddemo_bin% %qddemo_class%\lib
XCOPY %qddemo%\MANIFEST.MF %qddemo_class%
cd %cur_dir%
javac -d %qddemo_class% -encoding utf-8 -cp .;%qddemo_bin%\commons-codec-1.10.jar;%qddemo_bin%\commons-logging-1.2.jar;%qddemo_bin%\fastjson-1.2.47.jar;%qddemo_bin%\httpclient-4.5.5.jar;%qddemo_bin%\httpcore-4.4.9.jar;%qddemo_bin%\httpmime-4.5.6.jar; %JFILES%
cd %qddemo_class%
jar -cvfm %qddemo%\grabit-java.jar %qddemo%\MANIFEST-windwos.MF *
echo "successfully"
pause

28
api/java/compile.sh Executable file
View File

@ -0,0 +1,28 @@
#!/bin/bash
cur_dir=$(pwd)
function compile(){
src_dir=$cur_dir/src
bin_dir=$cur_dir/lib
class_dir=$cur_dir/class
rm -rf $src_dir/sources.list
find $src_dir -name "*.java" > $src_dir/sources.list
cat $src_dir/sources.list
rm -rf $class_dir
mkdir $class_dir
cp $cur_dir/MANIFEST.MF $class_dir
cp -r $cur_dir/lib $class_dir
javac -d $class_dir -cp .:$bin_dir/fastjson-1.2.47.jar:$bin_dir/commons-codec-1.10.jar:$bin_dir/commons-logging-1.2.jar:$bin_dir/slf4j-api-1.7.25.jar:$bin_dir/slf4j-log4j12-1.7.25.jar:$bin_dir/httpcore-4.4.9.jar:$bin_dir/httpclient-4.5.5.jar:$bin_dir/httpmime-4.5.6.jar -g -sourcepath $src_dir @$src_dir/sources.list
cd $class_dir
jar -cvfm $cur_dir/grabit-java.jar MANIFEST.MF *
}
compile
exit 0

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,56 @@
-- sql server sample sql
CREATE TABLE dbo.EmployeeSales
( DataSource varchar(20) NOT NULL,
BusinessEntityID varchar(11) NOT NULL,
LastName varchar(40) NOT NULL,
SalesDollars money NOT NULL
);
GO
CREATE PROCEDURE dbo.uspGetEmployeeSales
AS
SET NOCOUNT ON;
SELECT 'PROCEDURE', sp.BusinessEntityID, c.LastName,
sp.SalesYTD
FROM Sales.SalesPerson AS sp
INNER JOIN Person.Person AS c
ON sp.BusinessEntityID = c.BusinessEntityID
WHERE sp.BusinessEntityID LIKE '2%'
ORDER BY sp.BusinessEntityID, c.LastName;
GO
--INSERT...SELECT example
INSERT INTO dbo.EmployeeSales
SELECT 'SELECT', sp.BusinessEntityID, c.LastName, sp.SalesYTD
FROM Sales.SalesPerson AS sp
INNER JOIN Person.Person AS c
ON sp.BusinessEntityID = c.BusinessEntityID
WHERE sp.BusinessEntityID LIKE '2%'
ORDER BY sp.BusinessEntityID, c.LastName;
GO
CREATE VIEW hiredate_view
AS
SELECT p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO
CREATE VIEW view1
AS
SELECT fis.CustomerKey, fis.ProductKey, fis.OrderDateKey,
fis.SalesTerritoryKey, dst.SalesTerritoryRegion
FROM FactInternetSales AS fis
LEFT OUTER JOIN DimSalesTerritory AS dst
ON (fis.SalesTerritoryKey=dst.SalesTerritoryKey);
GO
SELECT ROW_NUMBER() OVER(PARTITION BY PostalCode ORDER BY SalesYTD DESC) AS "Row Number",
p.LastName, s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0
ORDER BY PostalCode;

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

156
api/java/readme.md Normal file
View File

@ -0,0 +1,156 @@
## JAVA Data lineage: using the SQLFlow REST API (Advanced)
This article illustrates how to discover the data lineage using JAVA and the SQLFlow REST API.
By using the SQLFlow REST API, you can code in JAVA to discover the data lineage in SQL scripts
and get the result in an actionable diagram, json, csv or graphml format.
You can integerate the JAVA code provided here into your own project and add the powerful
data lineage analsysis capability instantly.
### 1. interactive data lineage visualizations
![JAVA Data lineage](java-data-lineage.png)
### 2. [Data lineage in JSON format](java-data-lineage-result.json)
### 3. Data lineage in CSV, graphml format
## Prerequisites
- [SQLFlow Cloud Server or on-premise version](https://github.com/sqlparser/sqlflow_public/tree/master/api#prerequisites)
- Java 8 or higher version must be installed and configured correctly.
- setup the PATH like this: (Please change the JAVA_HOME according to your environment)
```
export JAVA_HOME=/usr/lib/jvm/default-java
export PATH=$JAVA_HOME/bin:$PATH
```
- compile and build `grabit-java.jar`
**mac&linux**
```
chmod 777 compile.sh
./compile.sh
```
**windows**
```
compile.bat
```
### Usage
````
java -jar grabit-java.jar /s server /p port /u userId /k userSecret /t databaseType /f path_to_config_file /r resultType
eg:
java -jar grabit-java.jar /u 'auth0|xxx' /k cab9712c45189014a94a8b7aceeef7a3db504be58e18cd3686f3bbefd078ef4d /s https://api.gudusoft.com /t oracle /f demo.sql /r 1
note:
If the parameter string contains symbols like "|" , it must be included in a single quotes (' ') or double quotes on windows (" ")
````
Example:
1. Connect to the SQLFlow Cloud Server
```
java -jar grabit-java.jar /s https://api.gudusoft.com /u 'YOUR_USER_ID' /k YOUR_SECRET_KEY /t sqlserver /f java-data-lineage-sqlserver.sql /r 1
```
2. Connect to the SQLFlow on-premise
This will discover data lineage by analyzing the `java-data-lineage-sqlserver.sql` file. You may also specify a zip file which includes lots of SQL files.
```
java -jar grabit-java.jar /s http://127.0.0.1 /p 8081 /u 'gudu|0123456789' /t sqlserver /f java-data-lineage-sqlserver.sql /r 1
```
This will discover data lineage by analyzing all SQL files under `sqlfiles` directory.
```
java -jar grabit-java.jar /s http://127.0.0.1 /p 8081 /u 'gudu|0123456789' /t mysql /f sqlfiles /r 1
```
After execution, view the `logs/graibt.log` file for the detailed information.
If the log prints a **submit the job to sqlflow successful**.
Then it is proved that the upload to SQLFlow has been successful.
Log in to the SQLFlow website to view the newly analyzed results.
In the `Job List`, you can view the analysis results of the currently submitted tasks.
### Parameters
- **path_to_config_file**
This can be a single SQL file, a zip file including multiple SQL files, or a directory including lots of SQL files.
- **server**
Usually, it is the IP address of [the SQLFlow on-premise version](https://www.gudusoft.com/sqlflow-on-premise-version/)
installed on your owner servers such as `127.0.0.1` or `http://127.0.0.1`
You may set the value to `https://api.gudusoft.com` if you like to send your SQL script to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com) to get the data lineage result.
- **port**
The default value is `8081` if you connect to your SQLFlow on-premise server.
However, if you setup the nginx reverse proxy in the nginx configuration file like this:
```
location /api/ {
proxy_pass http://127.0.0.1:8081/;
proxy_connect_timeout 600s ;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header User-Agent $http_user_agent;
}
```
Then, keep the value of `serverPort` empty and set `server` to the value like this: `http://127.0.0.1/api`.
>Please keep this value empty if you connect to the SQLFlow Cloud Server by specifying the `https://api.gudusoft.com`
in the `server`
>
- **userId, userSecret**
This is the user id that is used to connect to the SQLFlow server.
Always set this value to `gudu|0123456789` and keep `userSecret` empty if you use the SQLFlow on-premise version.
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
- **databaseType**
This parameter specifies the database dialect of the SQL scripts that the SQLFlow has analyzed.
```txt
access,bigquery,couchbase,dax,db2,greenplum,hana,hive,impala,informix,mdx,mssql,
sqlserver,mysql,netezza,odbc,openedge,oracle,postgresql,postgres,redshift,snowflake,
sybase,teradata,soql,vertica
```
- **resultType**
When you submit SQL script to the SQLFlow server, A job is created on the SQLFlow server
and you can always see the graphic data lineage result via the browser,
Even better, This demo will fetch the data lineage back to the directory where the demo is running.
Those data lineage results are stored in the `data/result/` directory.
This parameter specifies which kind of format is used to save the data lineage result.
Available values for this parameter:
- 1: JSON, data lineage result in JSON.
- 2: CSV, data lineage result in CSV format.
- 3: diagram, in graphml format that can be viewed by yEd.
### SQLFlow REST API
Please check here for the detailed information about the [SQLFlow REST API](https://github.com/sqlparser/sqlflow_public/tree/master/api/sqlflow_api.md)

View File

@ -0,0 +1,28 @@
package com.gudusoft.grabit;
import java.text.SimpleDateFormat;
import java.util.Date;
public class DateUtil {
public DateUtil() {
}
public static String format(Date date) {
return format(date, "yyyyMMdd");
}
public static String format(Date date, String pattern) {
if (date != null) {
SimpleDateFormat df = new SimpleDateFormat(pattern);
return df.format(date);
} else {
return null;
}
}
public static String timeStamp2Date(Long seconds) {
SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
return sdf.format(new Date(seconds));
}
}

View File

@ -0,0 +1,89 @@
package com.gudusoft.grabit;
import java.io.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class FileUtil {
private static final int BUFFER_SIZE = 10 * 1024 * 1024;
private FileUtil() {
}
public static void mkFile(String filePath) throws IOException {
File testFile = new File(filePath);
File fileParent = testFile.getParentFile();
if (!fileParent.exists()) {
fileParent.mkdirs();
}
if (!testFile.exists()) {
testFile.createNewFile();
}
}
public static void toZip(String srcDir, OutputStream out, boolean KeepDirStructure)
throws RuntimeException {
ZipOutputStream zos = null;
try {
zos = new ZipOutputStream(out);
File sourceFile = new File(srcDir);
compress(sourceFile, zos, sourceFile.getName(), KeepDirStructure);
} catch (Exception e) {
throw new RuntimeException("zip error from ZipUtils", e);
} finally {
if (zos != null) {
try {
zos.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
private static void compress(File sourceFile, ZipOutputStream zos, String name,
boolean KeepDirStructure) throws Exception {
byte[] buf = new byte[BUFFER_SIZE];
if (sourceFile.isFile()) {
zos.putNextEntry(new ZipEntry(name));
int len;
FileInputStream in = new FileInputStream(sourceFile);
while ((len = in.read(buf)) != -1) {
zos.write(buf, 0, len);
}
zos.closeEntry();
in.close();
} else {
File[] listFiles = sourceFile.listFiles();
if (listFiles == null || listFiles.length == 0) {
if (KeepDirStructure) {
zos.putNextEntry(new ZipEntry(name + "/"));
zos.closeEntry();
}
} else {
for (File file : listFiles) {
if (KeepDirStructure) {
compress(file, zos, name + "/" + file.getName(), KeepDirStructure);
} else {
compress(file, zos, file.getName(), KeepDirStructure);
}
}
}
}
}
public static OutputStream outStream(String path) throws IOException {
FileOutputStream fileOutputStream;
try {
fileOutputStream = new FileOutputStream(path);
} catch (Exception ex) {
mkFile(path);
fileOutputStream = new FileOutputStream(path);
}
return fileOutputStream;
}
}

View File

@ -0,0 +1,182 @@
package com.gudusoft.grabit;
import com.alibaba.fastjson.JSONObject;
import com.gudusoft.grabit.SqlFlowUtil;
import com.gudusoft.grabit.DateUtil;
import com.gudusoft.grabit.FileUtil;
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
public class Runner {
public static void main(String[] args) throws IOException {
if (args.length < 2) {
System.err.println("please enter the correct parameters.");
return;
}
List<String> argList = Arrays.asList(args);
matchParam("/f", argList);
String fileVal = detectParam("/f", args, argList);
File file = new File(fileVal);
if (!file.exists()) {
System.err.println("{} is not exist." + file);
return;
}
matchParam("/s", argList);
String server = detectParam("/s", args, argList);
if (!server.startsWith("http") && !server.startsWith("https")) {
server = "http://" + server;
}
if (server.endsWith(File.separator)) {
server = server.substring(0, server.length() - 1);
}
if (argList.contains("/p") && argList.size() > argList.indexOf("/p") + 1) {
server = server + ":" + detectParam("/p", args, argList);
}
matchParam("/u", argList);
String userId = detectParam("/u", args, argList).replace("'", "");
String userSecret = "";
if (argList.contains("/k") && argList.size() > argList.indexOf("/k") + 1) {
userSecret = detectParam("/k", args, argList);
}
String databaseType = "dbvoracle";
if (argList.contains("/t") && argList.size() > argList.indexOf("/t") + 1) {
databaseType = "dbv" + detectParam("/t", args, argList);
if ("dbvsqlserver".equalsIgnoreCase(databaseType)) {
databaseType = "dbvmssql";
}
}
int resultType = 1;
if (argList.contains("/r") && argList.size() > argList.indexOf("/r") + 1) {
resultType = Integer.parseInt(detectParam("/r", args, argList));
}
System.out.println("================= run start grabit ==================");
run(file, server, userId, userSecret, databaseType, resultType);
System.out.println("================= run end grabit ==================");
}
private static void run(File file, String server, String userId, String userSecret, String databaseType, Integer resultType) throws IOException {
String tokenUrl = String.format("%s/gspLive_backend/user/generateToken", server);
String token = SqlFlowUtil.getToken(tokenUrl, userId, userSecret, 0);
if ("".equals(token)) {
System.err.println("connection to sqlflow failed.");
System.exit(1);
}
String path = "";
if (file.isDirectory()) {
path = file.getPath() + ".zip";
FileUtil.toZip(file.getPath(), FileUtil.outStream(path), true);
} else if (file.isFile()) {
path = file.getPath();
}
String submitUrl = String.format("%s/gspLive_backend/sqlflow/job/submitUserJob", server);
final String taskName = DateUtil.format(new Date()) + "_" + System.currentTimeMillis();
String result = SqlFlowUtil.submitJob(path, submitUrl,
databaseType,
userId, token,
taskName);
JSONObject object = JSONObject.parseObject(result);
if (null != object) {
Integer code = object.getInteger("code");
if (code == 200) {
JSONObject data = object.getJSONObject("data");
System.out.println("submit job to sqlflow successful. SQLFlow is being analyzed...");
String jobId = data.getString("jobId");
String jsonJobUrl = String.format("%s/gspLive_backend/sqlflow/job/displayUserJobSummary", server);
while (true) {
String statusRs = SqlFlowUtil.getStatus(jsonJobUrl, userId, token, jobId);
JSONObject statusObj = JSONObject.parseObject(statusRs);
if (null != statusObj) {
if (statusObj.getInteger("code") == 200) {
JSONObject val = statusObj.getJSONObject("data");
String status = val.getString("status");
if ("success".equals(status) || "partial_success".equals(status)) {
System.out.println("sqlflow analyze successful.");
break;
}
if ("fail".equals(status)) {
System.err.println(val.getString("errorMessage"));
System.exit(1);
}
}
}
}
String rsUrl = "";
String downLoadPath = "";
String rootPath = "data" + File.separator + "result" + File.separator + DateUtil.timeStamp2Date(System.currentTimeMillis()) + "_" + jobId;
switch (resultType) {
case 1:
rsUrl = String.format("%s/gspLive_backend/sqlflow/job/exportLineageAsJson", server);
downLoadPath = rootPath + "_json.json";
break;
case 2:
rsUrl = String.format("%s/gspLive_backend/sqlflow/job/exportLineageAsCsv", server);
downLoadPath = rootPath + "_csv.csv";
break;
case 3:
rsUrl = String.format("%s/gspLive_backend/sqlflow/job/exportLineageAsGraphml", server);
downLoadPath = rootPath + "_graphml.graphml";
break;
default:
break;
}
SqlFlowUtil.ExportLineageReq request = new SqlFlowUtil.ExportLineageReq();
request.setToken(token);
request.setJobId(jobId);
request.setTableToTable(true);
request.setUserId(userId);
request.setUrl(rsUrl);
request.setDownloadFilePath(downLoadPath);
System.out.println("start export result from sqlflow.");
result = SqlFlowUtil.exportLineage(request);
if (!result.contains("success")) {
System.err.println("export json result failed");
System.exit(1);
}
System.out.println("export json result successful,downloaded file path is {}" + downLoadPath);
} else {
System.err.println("submit job to sqlflow failed.");
System.exit(1);
}
}
}
private static String detectParam(String param, String[] args, List<String> argList) {
try {
return args[argList.indexOf(param) + 1];
} catch (Exception e) {
System.err.println("Please enter the correct parameters.");
System.exit(1);
}
return null;
}
private static void matchParam(String param, List<String> argList) {
if (!argList.contains(param) || argList.size() <= argList.indexOf(param) + 1) {
System.err.println("{} parameter is required." + param);
System.exit(1);
}
}
}

View File

@ -0,0 +1,227 @@
package com.gudusoft.grabit;
import com.alibaba.fastjson.JSONObject;
import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import java.io.*;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class SqlFlowUtil {
private static String token = "";
private SqlFlowUtil() {
}
public static String getToken(String url, String userId,
String secretKey, Integer flag) {
try {
System.out.println("start get token from sqlflow.");
Map<String, String> param = new HashMap<>();
param.put("secretKey", secretKey);
param.put("userId", userId);
if ("gudu|0123456789".equals(userId)) {
return "token";
}
String result = doPost(url, param);
JSONObject object = JSONObject.parseObject(result);
if ("200".equals(object.getString("code"))) {
token = object.getString("token");
System.out.println("get token from sqlflow successful.");
return token;
}
return "";
} catch (Exception e) {
if (flag == 0) {
if (url.startsWith("http:")) {
url = url.replace("http", "https");
}
return getToken(url, userId,
secretKey, 1);
}
if (flag == 1) {
System.err.println("get token from sqlflow failed.");
}
return token;
}
}
public static String submitJob(String filePath,
String url,
String dbVendor,
String userId,
String token,
String jobName) throws IOException {
System.out.println("start submit job to sqlflow.");
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost uploadFile = new HttpPost(url);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addTextBody("dbvendor", dbVendor, ContentType.TEXT_PLAIN);
builder.addTextBody("jobName", jobName, ContentType.TEXT_PLAIN);
builder.addTextBody("token", token, ContentType.TEXT_PLAIN);
builder.addTextBody("userId", userId, ContentType.TEXT_PLAIN);
File f = new File(filePath);
builder.addBinaryBody("sqlfiles", new FileInputStream(f), ContentType.APPLICATION_OCTET_STREAM, f.getName());
HttpEntity multipart = builder.build();
uploadFile.setEntity(multipart);
CloseableHttpResponse response = httpClient.execute(uploadFile);
HttpEntity responseEntity = response.getEntity();
return EntityUtils.toString(responseEntity, "UTF-8");
}
public static String getStatus(String url,
String userId,
String token,
String jobId) throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost uploadFile = new HttpPost(url);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addTextBody("jobId", jobId, ContentType.TEXT_PLAIN);
builder.addTextBody("token", token, ContentType.TEXT_PLAIN);
builder.addTextBody("userId", userId, ContentType.TEXT_PLAIN);
HttpEntity multipart = builder.build();
uploadFile.setEntity(multipart);
CloseableHttpResponse response = httpClient.execute(uploadFile);
HttpEntity responseEntity = response.getEntity();
return EntityUtils.toString(responseEntity, "UTF-8");
}
public static String exportLineage(ExportLineageReq req) throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost uploadFile = new HttpPost(req.getUrl());
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addTextBody("jobId", req.getJobId(), ContentType.TEXT_PLAIN);
builder.addTextBody("userId", req.getUserId(), ContentType.TEXT_PLAIN);
builder.addTextBody("token", req.getToken(), ContentType.TEXT_PLAIN);
builder.addTextBody("tableToTable", String.valueOf(req.getTableToTable()), ContentType.TEXT_PLAIN);
HttpEntity multipart = builder.build();
uploadFile.setEntity(multipart);
CloseableHttpResponse response = httpClient.execute(uploadFile);
HttpEntity responseEntity = response.getEntity();
InputStream in = responseEntity.getContent();
FileUtil.mkFile(req.getDownloadFilePath());
File file = new File(req.getDownloadFilePath());
FileOutputStream fout = new FileOutputStream(file);
int a;
byte[] tmp = new byte[1024];
while ((a = in.read(tmp)) != -1) {
fout.write(tmp, 0, a);
}
fout.flush();
fout.close();
in.close();
return "download success, path:" + req.getDownloadFilePath();
}
private static String doPost(String url, Map<String, String> param) {
CloseableHttpClient httpClient = HttpClients.createDefault();
CloseableHttpResponse response = null;
String resultString = "";
try {
HttpPost httpPost = new HttpPost(url);
if (param != null) {
List<NameValuePair> paramList = new ArrayList<>();
for (String key : param.keySet()) {
paramList.add(new BasicNameValuePair(key, param.get(key)));
}
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(paramList, "utf-8");
httpPost.setEntity(entity);
}
response = httpClient.execute(httpPost);
resultString = EntityUtils.toString(response.getEntity(), "utf-8");
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
response.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return resultString;
}
public static class ExportLineageReq {
private String jobId;
private String userId;
private String token;
private String url;
private String downloadFilePath;
private Boolean tableToTable = false;
public String getJobId() {
return jobId;
}
public void setJobId(String jobId) {
this.jobId = jobId;
}
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getToken() {
return token;
}
public void setToken(String token) {
this.token = token;
}
public Boolean getTableToTable() {
return tableToTable;
}
public void setTableToTable(Boolean tableToTable) {
this.tableToTable = tableToTable;
}
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
public String getDownloadFilePath() {
return downloadFilePath;
}
public void setDownloadFilePath(String downloadFilePath) {
this.downloadFilePath = downloadFilePath;
}
}
}

View File

@ -0,0 +1,4 @@
/Users/g7/Documents/project/sqlflow_public/api/java/src/main/java/com/gudusoft/grabit/FileUtil.java
/Users/g7/Documents/project/sqlflow_public/api/java/src/main/java/com/gudusoft/grabit/Runner.java
/Users/g7/Documents/project/sqlflow_public/api/java/src/main/java/com/gudusoft/grabit/SqlFlowUtil.java
/Users/g7/Documents/project/sqlflow_public/api/java/src/main/java/com/gudusoft/grabit/DateUtil.java

BIN
api/job-types.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

151
api/php/Grabit.php Normal file
View File

@ -0,0 +1,151 @@
<?php
class Grabit
{
function run($argv)
{
if (sizeof($argv) < 2) {
echo 'please enter the correct parameters.';
exit(1);
}
$userSecret = '';
$userId = '';
$dbvendor = '';
$sqlfiles = '';
$server = '';
$port = '';
$download = 1;
for ($i = 0; $i < sizeof($argv) - 1; $i++) {
if ($argv[$i] == '/s') {
$server = $argv[$i + 1];
}
if ($argv[$i] == '/p') {
$port = $argv[$i + 1];
}
if ($argv[$i] == '/f') {
$sqlfiles = $argv[$i + 1];
if (!file_exists($sqlfiles)) {
echo "The file is no exists";
exit(1);
}
}
if ($argv[$i] == '/u') {
$userId = $argv[$i + 1];
$userId = str_replace("'", '', $userId);
}
if ($argv[$i] == '/t') {
$dbvendor = 'dbv' . $argv[$i + 1];
if ($dbvendor == 'dbvsqlserver') {
$dbvendor = 'dbvmssql';
}
}
if ($argv[$i] == '/k') {
$userSecret = $argv[$i + 1];
}
if ($argv[$i] == '/r') {
$download = $argv[$i + 1];
}
}
if (!substr($server, 0, 4) === "http" && !substr($server, 0, 5) === "https") {
$server = "http://" . $server;
}
if (substr($server, -strlen(DIRECTORY_SEPARATOR)) === DIRECTORY_SEPARATOR) {
$server = substr($server, 0, strlen($server) - 1);
}
if ($port != '') {
$server = $server . ':' . $port;
}
echo '===================================== start =====================================';
echo PHP_EOL;
echo('start get token.');
echo PHP_EOL;
include('SqlFlowUtil.php');
$obj = new SqlFlowUtil();
$token = $obj->getToken($server, $userId, $userSecret);
echo 'get token successful.';
echo PHP_EOL;
if (is_dir($sqlfiles)) {
if (substr($sqlfiles, -strlen(DIRECTORY_SEPARATOR)) === DIRECTORY_SEPARATOR) {
$sqlfiles = rtrim($sqlfiles, DIRECTORY_SEPARATOR);
}
$zip = new \ZipArchive();
$sqlfileDir = $sqlfiles . '.zip';
if (file_exists($sqlfileDir)) {
if (PATH_SEPARATOR == ':') {
unlink($sqlfileDir);
} else {
$url = iconv('utf-8', 'gbk', $sqlfileDir);
unlink($url);
}
}
$open = $zip->open($sqlfileDir, \ZipArchive::CREATE);
if ($open === true) {
$this->toZip($sqlfiles, $zip);
$zip->close();
}
$sqlfiles = $sqlfileDir;
}
echo 'start submit job.';
echo PHP_EOL;
$result = $obj->submitJob($server, $userId, $token, $sqlfiles, time(), $dbvendor);
if ($result['code'] == 200) {
echo 'submit job successful.';
echo PHP_EOL;
$jobId = $result['data']['jobId'];
while (true) {
$result = $obj->getStatus($server, $userId, $token, $jobId);
if ($result['code'] == 200) {
$status = $result['data']['status'];
if ($status == 'partial_success' || $status == 'success') {
break;
}
if ($status == 'fail') {
echo 'job execution failed.';
exit(1);
}
}
}
echo $status;
echo 'start get result from sqlflow.';
echo PHP_EOL;
$filePath = $obj->getResult($server, $userId, $token, $jobId, $download);
echo 'get result from sqlflow successful. file path is : ' . $filePath;
} else {
echo 'submit job failed.';
}
echo PHP_EOL;
echo '===================================== end =====================================';
}
function toZip($path, $zip)
{
$handler = opendir($path);
while (($filename = readdir($handler)) !== false) {
if ($filename != "." && $filename != "..") {
if (is_dir($path . DIRECTORY_SEPARATOR . $filename)) {
$obj = new Grabit();
$obj->toZip($path . DIRECTORY_SEPARATOR . $filename, $zip);
} else {
$zip->addFile($path . DIRECTORY_SEPARATOR . $filename);
$zip->renameName($path . DIRECTORY_SEPARATOR . $filename, $filename);
}
}
}
@closedir($path);
}
}
$obj = new Grabit();
$obj->run($argv);

110
api/php/HttpClient.php Normal file
View File

@ -0,0 +1,110 @@
<?php
class HttpClient
{
protected static $url;
protected static $delimiter;
function mkdirs($a1, $mode = 0777)
{
if (is_dir($a1) || @mkdir($a1, $mode)) return TRUE;
if (!static::mkdirs(dirname($a1), $mode)) return FALSE;
return @mkdir($a1, $mode);
}
public function __construct()
{
static::$delimiter = uniqid();
}
private static function buildData($param)
{
$data = '';
$eol = "\r\n";
$upload = $param['sqlfiles'];
unset($param['sqlfiles']);
foreach ($param as $name => $content) {
$data .= "--" . static::$delimiter . "\r\n"
. 'Content-Disposition: form-data; name="' . $name . "\"\r\n\r\n"
. $content . "\r\n";
}
$data .= "--" . static::$delimiter . $eol
. 'Content-Disposition: form-data; name="sqlfiles"; filename="' . $param['filename'] . '"' . "\r\n"
. 'Content-Type:application/octet-stream' . "\r\n\r\n";
$data .= $upload . "\r\n";
$data .= "--" . static::$delimiter . "--\r\n";
return $data;
}
function postFile($url, $param)
{
$post_data = static::buildData($param);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post_data);
curl_setopt($curl, CURLOPT_HTTPHEADER, [
"Content-Type: multipart/form-data; boundary=" . static::$delimiter,
"Content-Length: " . strlen($post_data)
]);
$response = curl_exec($curl);
curl_close($curl);
$info = json_decode($response, true);
return $info;
}
function postFrom($url, $data)
{
$headers = array('Content-Type: application/x-www-form-urlencoded');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_AUTOREFERER, 1);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($data));
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($curl);
if (curl_errno($curl)) {
return 'Errno' . curl_error($curl);
}
curl_close($curl);
return json_decode($result, true);
}
function postJson($url, $data, $filePath)
{
$headers = array('Content-Type: application/x-www-form-urlencoded');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_AUTOREFERER, 1);
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($data));
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($curl);
if (curl_errno($curl)) {
return 'Errno' . curl_error($curl);
}
$fp = @fopen($filePath, "a");
fwrite($fp, $result);
fclose($fp);
}
}

72
api/php/SqlFlowUtil.php Normal file
View File

@ -0,0 +1,72 @@
<?php
include('HttpClient.php');
class SqlFlowUtil
{
function getToken($server, $userId, $userSecret)
{
if ($userId == 'gudu|0123456789') {
return 'token';
}
$httpVendor = new HttpClient();
$json['userId'] = $userId;
$json['secretKey'] = $userSecret;
$url = $server . '/gspLive_backend/user/generateToken';
$result = $httpVendor->postFrom($url, $json);
return $result['token'];
}
function submitJob($server, $userId, $token, $sqlfiles, $jobName, $dbvendor)
{
$httpVendor = new HttpClient();
$params = array(
'userId' => $userId,
'token' => $token,
'jobName' => $jobName,
'dbvendor' => $dbvendor,
'filename' => $jobName,
'sqlfiles' => file_get_contents($sqlfiles)
);
$url = $server . '/gspLive_backend/sqlflow/job/submitUserJob';
$result = $httpVendor->postFile($url, $params);
return $result;
}
function getStatus($server, $userId, $token, $jobId)
{
$httpVendor = new HttpClient();
$json['userId'] = $userId;
$json['token'] = $token;
$json['jobId'] = $jobId;
$url = $server . '/gspLive_backend/sqlflow/job/displayUserJobSummary';
$result = $httpVendor->postFrom($url, $json);
return $result;
}
function getResult($server, $userId, $token, $jobId, $download)
{
$dir = 'data' . DIRECTORY_SEPARATOR . 'result';
$str = $dir . DIRECTORY_SEPARATOR . date("Ymd") . '_' . $jobId;
$filePath = '';
$url = '';
if ($download == 1) {
$url = $server . '/gspLive_backend/sqlflow/job/exportLineageAsJson';
$filePath = $str . '_json.json';
} else if ($download == 2) {
$url = $server . '/gspLive_backend/sqlflow/job/exportLineageAsGraphml';
$filePath = $str . '_graphml.graphml';
} else if ($download == 3) {
$url = $server . '/gspLive_backend/sqlflow/job/exportLineageAsCsv';
$filePath = $str . '_csv.csv';
}
$httpVendor = new HttpClient();
$json['userId'] = $userId;
$json['token'] = $token;
$json['jobId'] = $jobId;
$httpVendor->mkdirs($dir);
$httpVendor->postJson($url, $json, $filePath);
return $filePath;
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,56 @@
-- sql server sample sql
CREATE TABLE dbo.EmployeeSales
( DataSource varchar(20) NOT NULL,
BusinessEntityID varchar(11) NOT NULL,
LastName varchar(40) NOT NULL,
SalesDollars money NOT NULL
);
GO
CREATE PROCEDURE dbo.uspGetEmployeeSales
AS
SET NOCOUNT ON;
SELECT 'PROCEDURE', sp.BusinessEntityID, c.LastName,
sp.SalesYTD
FROM Sales.SalesPerson AS sp
INNER JOIN Person.Person AS c
ON sp.BusinessEntityID = c.BusinessEntityID
WHERE sp.BusinessEntityID LIKE '2%'
ORDER BY sp.BusinessEntityID, c.LastName;
GO
--INSERT...SELECT example
INSERT INTO dbo.EmployeeSales
SELECT 'SELECT', sp.BusinessEntityID, c.LastName, sp.SalesYTD
FROM Sales.SalesPerson AS sp
INNER JOIN Person.Person AS c
ON sp.BusinessEntityID = c.BusinessEntityID
WHERE sp.BusinessEntityID LIKE '2%'
ORDER BY sp.BusinessEntityID, c.LastName;
GO
CREATE VIEW hiredate_view
AS
SELECT p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO
CREATE VIEW view1
AS
SELECT fis.CustomerKey, fis.ProductKey, fis.OrderDateKey,
fis.SalesTerritoryKey, dst.SalesTerritoryRegion
FROM FactInternetSales AS fis
LEFT OUTER JOIN DimSalesTerritory AS dst
ON (fis.SalesTerritoryKey=dst.SalesTerritoryKey);
GO
SELECT ROW_NUMBER() OVER(PARTITION BY PostalCode ORDER BY SalesYTD DESC) AS "Row Number",
p.LastName, s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0
ORDER BY PostalCode;

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

192
api/php/readme.md Normal file
View File

@ -0,0 +1,192 @@
## PHP Data lineage: using the SQLFlow REST API (Advanced)
This article illustrates how to discover the data lineage using PHP and the SQLFlow REST API.
By using the SQLFlow REST API, you can code in PHP to discover the data lineage in SQL scripts
and get the result in an actionable diagram, json, csv or graphml format.
You can integerate the PHP code provided here into your own project and add the powerful
data lineage analsysis capability instantly.
### 1. interactive data lineage visualizations
![PHP Data lineage](php-data-lineage.png)
### 2. [Data lineage in JSON format](php-data-lineage-result.json)
### 3. Data lineage in CSV, graphml format
## Prerequisites
- [SQLFlow Cloud Server or on-premise version](https://github.com/sqlparser/sqlflow_public/tree/master/api#prerequisites)
- PHP 7.3 or higher version must be installed and configured correctly.
- Install the ZIP extension
**mac**
````
wget http://pecl.php.net/get/zip-1.12.4.tgz
tar zxfv zip-1.12.4.tgz
cd zip-1.12.4
sudo mount -uw /
sudo ln -s /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/ /usr
sudo phpize
which php-config(get path,eg :/usr/bin/php-config)
./configure --with-php-config=/usr/bin/php-config
sudo mount -uw /
sudo make
sudo make install
cd /usr/lib/php/extensions/no-debug-non-zts-20180731
sudo cp /private/etc/php.ini.default php.ini
chmod 777 php.ini
sudo vim php.ini, write extension=zip.so
sudo apachectl restart
````
**linux**
````
wget http://pecl.php.net/get/zip-1.12.4.tgz
tar zxfv zip-1.12.4.tgz
cd zip-1.12.4
sudo phpize
which php-config(get path,eg :/usr/bin/php-config)
./configure --with-php-config=/usr/bin/php-config
sudo make
sudo make install
cd /usr/lib/php/extensions/no-debug-non-zts-20180731
sudo vi /usr/local/php/etc/php.ini, write extension=zip.so
sudo apachectl restart
````
#### [Reference Documentation](https://www.php.net/manual/en/install.pecl.phpize.php)
### Usage
````
php Grabit.php /s server /p port /u userId /k userSecret /t databaseType /f path_to_config_file /r resultType
eg:
php Grabit.php /u 'auth0|xxx' /k cab9712c45189014a94a8b7aceeef7a3db504be58e18cd3686f3bbefd078ef4d /s https://api.gudusoft.com /t oracle /f demo.sql /r 1
note:
If the parameter string contains symbols like "|" , it must be included in a single quotes (' ')
````
Example:
1. Connect to the SQLFlow Cloud Server
```
php Grabit.php /s https://api.gudusoft.com /u 'YOUR_USER_ID' /k YOUR_SECRET_KEY /t sqlserver /f PHP-data-lineage-sqlserver.sql /r 1
```
2. Connect to the SQLFlow on-premise
This will discover data lineage by analyzing the `PHP-data-lineage-sqlserver.sql` file. You may also specify a zip file which includes lots of SQL files.
```
php Grabit.php /s http://127.0.0.1 /p 8081 /u 'gudu|0123456789' /t sqlserver /f PHP-data-lineage-sqlserver.sql /r 1
```
This will discover data lineage by analyzing all SQL files under `sqlfiles` directory.
```
php Grabit.php /s http://127.0.0.1 /p 8081 /u 'gudu|0123456789' /t mysql /f sqlfiles /r 1
```
### Parameters
- **path_to_config_file**
This can be a single SQL file, a zip file including multiple SQL files, or a directory including lots of SQL files.
- **server**
Usually, it is the IP address of [the SQLFlow on-premise version](https://www.gudusoft.com/sqlflow-on-premise-version/)
installed on your owner servers such as `127.0.0.1` or `http://127.0.0.1`
You may set the value to `https://api.gudusoft.com` if you like to send your SQL script to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com) to get the data lineage result.
- **port**
The default value is `8081` if you connect to your SQLFlow on-premise server.
However, if you setup the nginx reverse proxy in the nginx configuration file like this:
```
location /api/ {
proxy_pass http://127.0.0.1:8081/;
proxy_connect_timeout 600s ;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header User-Agent $http_user_agent;
}
```
Then, keep the value of `serverPort` empty and set `server` to the value like this: `http://127.0.0.1/api`.
>Please keep this value empty if you connect to the SQLFlow Cloud Server by specifying the `https://api.gudusoft.com`
in the `server`
>
- **userId, userSecret**
This is the user id that is used to connect to the SQLFlow server.
Always set this value to `gudu|0123456789` and keep `userSecret` empty if you use the SQLFlow on-premise version.
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
- **databaseType**
This parameter specifies the database dialect of the SQL scripts that the SQLFlow has analyzed.
```txt
access,bigquery,couchbase,dax,db2,greenplum,hana,hive,impala,informix,mdx,mssql,
sqlserver,mysql,netezza,odbc,openedge,oracle,postgresql,postgres,redshift,snowflake,
sybase,teradata,soql,vertica
```
- **resultType**
When you submit SQL script to the SQLFlow server, A job is created on the SQLFlow server
and you can always see the graphic data lineage result via the browser,
Even better, This demo will fetch the data lineage back to the directory where the demo is running.
Those data lineage results are stored in the `data/result/` directory.
This parameter specifies which kind of format is used to save the data lineage result.
Available values for this parameter:
- 1: JSON, data lineage result in JSON.
- 2: CSV, data lineage result in CSV format.
- 3: diagram, in graphml format that can be viewed by yEd.
### SQLFlow REST API
Please check here for the detailed information about the [SQLFlow REST API](https://github.com/sqlparser/sqlflow_public/tree/master/api/sqlflow_api.md)

View File

@ -0,0 +1,77 @@
/**
* 解析SQLFLow exportLineageAsJson接口返回的JSON格式的血缘关系中的关系链路
*
* 例如demo中的血缘数据解析成以下链路
* 达成的目标是List中两个元素
* SCOTT.DEPT -> SCOTT.EMP->VSAL
* SCOTT.EMP->VSAL
*/
import json
class Node:
def __init__(self, value, node_id):
self.value = value
self.id = node_id
self.next = None
def key(self):
node = self.next
key = self.id
while node:
key += node.id
node = node.next
return key
def main():
input_data = '{"jobId":"d9550e491c024d0cbe6e1034604aca17","code":200,"data":{"mode":"global","sqlflow":{"relationship":[{"sources":[{"parentName":"ORDERS","column":"TABLE","coordinates":[],"id":"10000106","parentId":"86"}],"id":"1000012311","type":"fdd","target":{"parentName":"SPECIAL_ORDERS","column":"TABLE","coordinates":[],"id":"10000102","parentId":"82"}},{"sources":[{"parentName":"CUSTOMERS","column":"TABLE","coordinates":[],"id":"10000103","parentId":"94"}],"id":"1000012312","type":"fdd","target":{"parentName":"SPECIAL_ORDERS","column":"TABLE","coordinates":[],"id":"10000102","parentId":"82"}}]}},"sessionId":"8bb7d3da4b687bb7badf01608a739fbebd61309cd5a643cecf079d122095738a_1685604216451"}'
try:
data = json.loads(input_data)
relationship_node = data["data"]["sqlflow"]["relationships"]
data_list = relationship_node
value = []
node_map = {}
for data_item in data_list:
sources = data_item["sources"]
target_node = data_item["target"]
target = Node(target_node["parentName"], target_node["parentId"])
if sources:
for source in sources:
parent_id = source["parentId"]
parent_name = source["parentName"]
source_node = Node(parent_name, parent_id)
source_node.next = target
value.append(source_node)
node_map[parent_id] = source_node
else:
value.append(target)
node_map[target_node["parentId"]] = target
for node in value:
next_node = node.next
if next_node:
next_id = next_node.id
next_node = node_map.get(next_id)
if next_node:
node.next = next_node
key_set = set()
value_iter = iter(value)
while True:
try:
node = next(value_iter)
k = node.key()
if k in key_set:
value_iter.remove()
key_set.add(k)
except StopIteration:
break
chains = []
print(chains)
except json.JSONDecodeError as e:
print(e)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,45 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import json
def getToken(sys, userId, server, port):
if len(sys.argv) < 1:
print('Please enter the args.')
sys.exit(0)
url = '/gspLive_backend/user/generateToken'
screctKey = ''
for i in range(1, len(sys.argv)):
if sys.argv[i] == '/k':
try:
if sys.argv[i + 1] is not None:
screctKey = sys.argv[i + 1]
except Exception:
print(
'Please enter the screctKeythe secret key of sqlflow user for webapi request, required true. eg: /k xxx')
sys.exit(0)
if port != '':
url = server + ':' + port + url
else:
url = server + url
mapA = {'secretKey': screctKey, 'userId': userId}
header_dict = {"Content-Type": "application/x-www-form-urlencoded"}
print('start get token.')
try:
r = requests.post(url, data=mapA, headers=header_dict)
except Exception:
print('get token failed.')
sys.exit(0)
result = json.loads(r.text)
if result['code'] == '200':
print('get token successful.')
return result['token']
else:
print(result['error'])
sys.exit(0)

View File

@ -0,0 +1,32 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import json
import sys
def getStatus(userId, token, server, port, jobId):
url = "/gspLive_backend/sqlflow/job/displayUserJobSummary"
if port != '':
url = server + ':' + port + url
else:
url = server + url
data = {'jobId': jobId, 'token': token, 'userId': userId}
datastr = json.dumps(data)
try:
response = requests.post(url, data=eval(datastr))
except Exception:
print('get job status to sqlflow failed.')
sys.exit(0)
result = json.loads(response.text)
if result['code'] == 200:
status = result['data']['status']
if status == 'fail':
print(result['data']['errorMessage'])
sys.exit(0)
return status

View File

@ -0,0 +1,51 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import json
import sys
import os
def getResult(download, userId, token, server, port, jobId, filePath):
sep = 'data' + os.sep + 'result' + os.sep
filePath = filePath + '_' + jobId
if download == 'json':
url = "/gspLive_backend/sqlflow/job/exportLineageAsJson"
filePath = sep + filePath + '_json.json'
elif download == 'graphml':
url = "/gspLive_backend/sqlflow/job/exportLineageAsGraphml"
filePath = sep + filePath + '_graphml.graphml'
elif download == 'csv':
url = "/gspLive_backend/sqlflow/job/exportLineageAsCsv"
filePath = sep + filePath + '_csv.csv'
else:
print('Please enter the correct output type.')
sys.exit(0)
if port != '':
url = server + ':' + port + url
else:
url = server + url
data = {'jobId': jobId, 'token': token, 'userId': userId, 'tableToTable': 'false'}
datastr = json.dumps(data)
print('start download result to sqlflow.')
try:
response = requests.post(url, data=eval(datastr))
except Exception:
print('download result to sqlflow failed.')
sys.exit(0)
if not os.path.exists(sep):
os.makedirs(sep)
try:
with open(filePath, 'wb') as f:
f.write(response.content)
except Exception:
print(filePath, 'is not exist.')
sys.exit(0)
print('download result to sqlflow successful.file path is ', filePath)

View File

@ -0,0 +1,130 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os
import sys
import GetGenerateToken
import SubmitJob
import time
import GetResultToSqlflow
import GetJobStatus
import datetime
if __name__ == '__main__':
print('========================================grabit-python======================================')
userId = ''
dbvendor = ''
sqlfiles = ''
server = ''
port = ''
download = ''
for i in range(1, len(sys.argv)):
if sys.argv[i] == '/u':
try:
if sys.argv[i + 1] is not None:
userId = sys.argv[i + 1]
else:
print(
'Please enter the userIdthe user id of sqlflow web or client, required true. eg: /n gudu|123456789')
sys.exit(0)
except BrokenPipeError:
print(
'Please enter the userIdthe user id of sqlflow web or client, required true. eg: /n gudu|123456789')
if sys.argv[i] == '/t':
try:
if sys.argv[i + 1] is not None:
dbvendor = sys.argv[i + 1]
else:
print(
'Please enter the dbvendor.')
sys.exit(0)
except Exception:
print(
'Please enter the dbvendor.')
if sys.argv[i] == '/f':
try:
if sys.argv[i + 1] is not None:
sqlfiles = sys.argv[i + 1]
else:
print(
'Please enter the sqlfilesrequest sql files, please use multiple parts to submit the sql files, required true. eg: /f path')
sys.exit(0)
except Exception:
print(
'Please enter the sqlfilesrequest sql files, please use multiple parts to submit the sql files, required true. eg: /f path')
if sys.argv[i] == '/s':
try:
if sys.argv[i + 1] is not None:
server = sys.argv[i + 1]
else:
print('Please enter the server. eg: /s https://api.gudusoft.com or /s https://127.0.0.1')
sys.exit(0)
except Exception:
print('Please enter the server. eg: /s https://api.gudusoft.com or /s https://127.0.0.1')
sys.exit(0)
if sys.argv[i] == '/p':
try:
if sys.argv[i + 1] is not None:
port = sys.argv[i + 1]
except Exception:
print('Please enter the port. eg: /p 8081')
sys.exit(0)
if sys.argv[i] == '/r':
try:
if sys.argv[i + 1] is not None:
download = sys.argv[i + 1]
except Exception:
print('Please enter the download type to sqlflow,type 1:json 2:csv 3:diagram : eg: /r 1')
sys.exit(0)
if userId == '':
print('Please enter the userIdthe user id of sqlflow web or client, required true. eg: /n gudu|123456789')
sys.exit(0)
if dbvendor == '':
print(
'Please enter the dbvendoravailable values:bigquery,couchbase,db2,greenplum,hana,hive,impala,informix,mdx,mysql,netezza,openedge,oracle,postgresql,redshift,snowflake,mssql,sybase,teradata,vertica. eg: /t oracle')
sys.exit(0)
if dbvendor == 'mssql' or dbvendor == 'sqlserver':
dbvendor = 'mssql'
dbvendor = 'dbv' + dbvendor
if sqlfiles == '':
print(
'Please enter the sqlfilesrequest sql files, please use multiple parts to submit the sql files, required true. eg: /f path')
sys.exit(0)
if server == '':
print('Please enter the server. eg: /s https://api.gudusoft.com or /s https://127.0.0.1')
sys.exit(0)
if server.find('http:') == -1 and server.find('https:') == -1:
server = 'http://' + server
if server.endswith(os.sep):
server = server[:-1]
if server == 'https://sqlflow.gudusoft.com':
server = 'https://api.gudusoft.com'
if userId == 'gudu|0123456789':
token = 'token'
else:
token = GetGenerateToken.getToken(sys, userId, server, port)
time_ = datetime.datetime.now().strftime('%Y%m%d')
jobId = SubmitJob.toSqlflow(userId, token, server, port, time_, dbvendor, sqlfiles)
if download != '':
while True:
status = GetJobStatus.getStatus(userId, token, server, port, jobId)
if status == 'partial_success' or status == 'success':
GetResultToSqlflow.getResult(download, userId, token, server, port, jobId, time_)
break
print('========================================grabit-python======================================')
sys.exit(0)

View File

@ -0,0 +1,57 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import zipfile
import requests
import json
import sys
import os
def toSqlflow(userId, token, server, port, jobName, dbvendor, sqlfiles):
url = '/gspLive_backend/sqlflow/job/submitUserJob'
if port != '':
url = server + ':' + port + url
else:
url = server + url
if os.path.isdir(sqlfiles):
sqlfiles = toZip(sqlfiles)
files = {'sqlfiles': open(sqlfiles, 'rb')}
data = {'dbvendor': dbvendor, 'jobName': jobName, 'token': token, 'userId': userId}
datastr = json.dumps(data)
print('start submit job to sqlflow.')
try:
response = requests.post(url, data=eval(datastr), files=files)
except Exception:
print('submit job to sqlflow failed.')
sys.exit(0)
result = json.loads(response.text)
if result['code'] == 200:
print('submit job to sqlflow successful.')
return result['data']['jobId']
else:
print(result['error'])
sys.exit(0)
def toZip(start_dir):
if start_dir.endswith(os.sep):
start_dir = start_dir[:-1]
start_dir = start_dir
file_news = start_dir + '.zip'
z = zipfile.ZipFile(file_news, 'w', zipfile.ZIP_DEFLATED)
for dir_path, dir_names, file_names in os.walk(start_dir):
f_path = dir_path.replace(start_dir, '')
f_path = f_path and f_path + os.sep or ''
for filename in file_names:
z.write(os.path.join(dir_path, filename), f_path + filename)
z.close()
return file_news

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,56 @@
-- sql server sample sql
CREATE TABLE dbo.EmployeeSales
( DataSource varchar(20) NOT NULL,
BusinessEntityID varchar(11) NOT NULL,
LastName varchar(40) NOT NULL,
SalesDollars money NOT NULL
);
GO
CREATE PROCEDURE dbo.uspGetEmployeeSales
AS
SET NOCOUNT ON;
SELECT 'PROCEDURE', sp.BusinessEntityID, c.LastName,
sp.SalesYTD
FROM Sales.SalesPerson AS sp
INNER JOIN Person.Person AS c
ON sp.BusinessEntityID = c.BusinessEntityID
WHERE sp.BusinessEntityID LIKE '2%'
ORDER BY sp.BusinessEntityID, c.LastName;
GO
--INSERT...SELECT example
INSERT INTO dbo.EmployeeSales
SELECT 'SELECT', sp.BusinessEntityID, c.LastName, sp.SalesYTD
FROM Sales.SalesPerson AS sp
INNER JOIN Person.Person AS c
ON sp.BusinessEntityID = c.BusinessEntityID
WHERE sp.BusinessEntityID LIKE '2%'
ORDER BY sp.BusinessEntityID, c.LastName;
GO
CREATE VIEW hiredate_view
AS
SELECT p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO
CREATE VIEW view1
AS
SELECT fis.CustomerKey, fis.ProductKey, fis.OrderDateKey,
fis.SalesTerritoryKey, dst.SalesTerritoryRegion
FROM FactInternetSales AS fis
LEFT OUTER JOIN DimSalesTerritory AS dst
ON (fis.SalesTerritoryKey=dst.SalesTerritoryKey);
GO
SELECT ROW_NUMBER() OVER(PARTITION BY PostalCode ORDER BY SalesYTD DESC) AS "Row Number",
p.LastName, s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0
ORDER BY PostalCode;

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

View File

@ -0,0 +1,129 @@
## Python Data lineage: using the SQLFlow REST API (Advanced)
This article illustrates how to discover the data lineage using Python and the SQLFlow REST API.
By using the SQLFlow REST API, you can code in python to discover the data lineage in SQL scripts
and get the result in an actionable diagram, json, csv or graphml format.
You can integerate the python code provided here into your own project and add the powerful
data lineage analsysis capability instantly.
### 1. interactive data lineage visualizations
![Python Data lineage](python-data-lineage.png)
### 2. [Data lineage in JSON format](python-data-lineage-result.json)
### 3. Data lineage in CSV, graphml format
## Prerequisites
- [SQLFlow Cloud Server or on-premise version](https://github.com/sqlparser/sqlflow_public/tree/master/api#prerequisites)
- Python 2.7 or higher version must be installed and configured correctly.
- Installing Dependency Libraries:
```
pip install requests
```
### Usage
````
python Grabit.py /s server /p port /u userId /k userSecret /t databaseType /f path_to_config_file /r resultType
eg:
python Grabit.py /u 'auth0|xxx' /k cab9712c45189014a94a8b7aceeef7a3db504be58e18cd3686f3bbefd078ef4d /s https://api.gudusoft.com /t oracle /f demo.sql /r 1
note:
If the parameter string contains symbols like "|" , it must be included in a single quotes (' ')
````
Example:
1. Connect to the SQLFlow Cloud Server
```
python Grabit.py /s https://api.gudusoft.com /u 'YOUR_USER_ID' /k YOUR_SECRET_KEY /t sqlserver /f python-data-lineage-sqlserver.sql /r 1
```
2. Connect to the SQLFlow on-premise
This will discover data lineage by analyzing the `python-data-lineage-sqlserver.sql` file. You may also specify a zip file which includes lots of SQL files.
```
python Grabit.py /s http://127.0.0.1 /p 8081 /u 'gudu|0123456789' /t sqlserver /f python-data-lineage-sqlserver.sql /r 1
```
This will discover data lineage by analyzing all SQL files under `sqlfiles` directory.
```
python Grabit.py /s http://127.0.0.1 /p 8081 /u 'gudu|0123456789' /t mysql /f sqlfiles /r 1
```
### Parameters
- **path_to_config_file**
This can be a single SQL file, a zip file including multiple SQL files, or a directory including lots of SQL files.
- **server**
Usually, it is the IP address of [the SQLFlow on-premise version](https://www.gudusoft.com/sqlflow-on-premise-version/)
installed on your owner servers such as `127.0.0.1` or `http://127.0.0.1`
You may set the value to `https://api.gudusoft.com` if you like to send your SQL script to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com) to get the data lineage result.
- **port**
The default value is `8081` if you connect to your SQLFlow on-premise server.
However, if you setup the nginx reverse proxy in the nginx configuration file like this:
```
location /api/ {
proxy_pass http://127.0.0.1:8081/;
proxy_connect_timeout 600s ;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header User-Agent $http_user_agent;
}
```
Then, keep the value of `serverPort` empty and set `server` to the value like this: `http://127.0.0.1/api`.
>Please keep this value empty if you connect to the SQLFlow Cloud Server by specifying the `https://api.gudusoft.com`
in the `server`
>
- **userId, userSecret**
This is the user id that is used to connect to the SQLFlow server.
Always set this value to `gudu|0123456789` and keep `userSecret` empty if you use the SQLFlow on-premise version.
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
- **databaseType**
This parameter specifies the database dialect of the SQL scripts that the SQLFlow has analyzed.
```txt
access,bigquery,couchbase,dax,db2,greenplum,hana,hive,impala,informix,mdx,mssql,
sqlserver,mysql,netezza,odbc,openedge,oracle,postgresql,postgres,redshift,snowflake,
sybase,teradata,soql,vertica
```
- **resultType**
When you submit SQL script to the SQLFlow server, A job is created on the SQLFlow server
and you can always see the graphic data lineage result via the browser,
Even better, This demo will fetch the data lineage back to the directory where the demo is running.
Those data lineage results are stored in the `data/result/` directory.
This parameter specifies which kind of format is used to save the data lineage result.
Available values for this parameter:
- 1: JSON, data lineage result in JSON.
- 2: CSV, data lineage result in CSV format.
- 3: diagram, in graphml format that can be viewed by yEd.
### SQLFlow REST API
Please check here for the detailed information about the [SQLFlow REST API](https://github.com/sqlparser/sqlflow_public/tree/master/api/sqlflow_api.md)

View File

@ -0,0 +1,232 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import zipfile
import requests
import time
import json
import sys
import os
def toSqlflow(userId, token, server, port, jobName, dbvendor, sqlfiles):
url = '/api/gspLive_backend/sqlflow/job/submitUserJob'
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/job/submitUserJob'
if port != '':
url = server + ':' + port + url
else:
url = server + url
if os.path.isdir(sqlfiles):
sqlfiles = toZip(sqlfiles)
files = {'sqlfiles': open(sqlfiles, 'rb')}
data = {'dbvendor': dbvendor, 'jobName': jobName, 'token': token, 'userId': userId}
datastr = json.dumps(data)
print('start submit job to sqlflow.')
try:
response = requests.post(url, data=eval(datastr), files=files, verify=False)
except Exception:
print('submit job to sqlflow failed.')
sys.exit(0)
result = json.loads(response.text)
if result['code'] == 200:
print('submit job to sqlflow successful.')
return result['data']['jobId']
else:
print(result['error'])
sys.exit(0)
def toZip(start_dir):
if start_dir.endswith(os.sep):
start_dir = start_dir[:-1]
start_dir = start_dir
file_news = start_dir + '.zip'
z = zipfile.ZipFile(file_news, 'w', zipfile.ZIP_DEFLATED)
for dir_path, dir_names, file_names in os.walk(start_dir):
f_path = dir_path.replace(start_dir, '')
f_path = f_path and f_path + os.sep or ''
for filename in file_names:
z.write(os.path.join(dir_path, filename), f_path + filename)
z.close()
return file_news
def getToken(userId, server, port,screctKey):
if userId == 'gudu|0123456789':
return 'token'
url = '/api/gspLive_backend/user/generateToken'
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/user/generateToken'
if port != '':
url = server + ':' + port + url
else:
url = server + url
mapA = {'secretKey': screctKey, 'userId': userId}
header_dict = {"Content-Type": "application/x-www-form-urlencoded"}
print('start get token.')
try:
r = requests.post(url, data=mapA, headers=header_dict, verify=False)
print(r)
except Exception:
print('get token failed.')
result = json.loads(r.text)
if result['code'] == '200':
print('get token successful.')
return result['token']
else:
print(result['error'])
def getResult(dataLineageFileType, userId, token, server, port, jobId, filePath):
sep = 'data' + os.sep + 'result' + os.sep
filePath = filePath + '_' + jobId
if dataLineageFileType == 'json':
url = "/api/gspLive_backend/sqlflow/job/exportLineageAsJson"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/job/exportLineageAsJson'
filePath = sep + filePath + '_json.json'
elif dataLineageFileType == 'graphml':
url = "/api/gspLive_backend/sqlflow/job/exportLineageAsGraphml"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/job/exportLineageAsGraphml'
filePath = sep + filePath + '_graphml.graphml'
elif dataLineageFileType == 'csv':
url = "/api/gspLive_backend/sqlflow/job/exportLineageAsCsv"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/job/exportLineageAsCsv'
filePath = sep + filePath + '_csv.csv'
else:
url = "/api/gspLive_backend/sqlflow/job/exportLineageAsJson"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/job/exportLineageAsJson'
filePath = sep + filePath + '_json.json'
if port != '':
url = server + ':' + port + url
else:
url = server + url
data = {'jobId': jobId, 'token': token, 'userId': userId, 'tableToTable': 'false'}
datastr = json.dumps(data)
print('start download result to sqlflow.')
try:
response = requests.post(url, data=eval(datastr), verify=False)
except Exception:
print('download result to sqlflow failed.')
sys.exit(0)
if not os.path.exists(sep):
os.makedirs(sep)
try:
with open(filePath, 'wb') as f:
f.write(response.content)
except Exception:
print(filePath, 'is not exist.')
sys.exit(0)
print('download result to sqlflow successful.file path is ', filePath)
def getStatus(userId, token, server, port, jobId):
url = "/api/gspLive_backend/sqlflow/job/displayUserJobSummary"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/job/displayUserJobSummary'
if port != '':
url = server + ':' + port + url
else:
url = server + url
data = {'jobId': jobId, 'token': token, 'userId': userId}
datastr = json.dumps(data)
try:
response = requests.post(url, data=eval(datastr), verify=False)
except Exception:
print('get job status to sqlflow failed.')
sys.exit(0)
result = json.loads(response.text)
if result['code'] == 200:
status = result['data']['status']
if status == 'fail':
print(result['data']['errorMessage'])
sys.exit(0)
return status
if __name__ == '__main__':
if len(sys.argv) < 1:
print('Please enter the args.')
sys.exit(0)
# the user id of sqlflow web or client, required true
userId = ''
# the secret key of sqlflow user for webapi request, required true
screctKey = ''
# sqlflow server
server = ''
# sqlflow api port
port = ''
# database type
dbvendor = 'dbvmysql'
sqlfile = ''
dataLineageFileType = ''
for i in range(1, len(sys.argv)):
if sys.argv[i] == '/f':
try:
if sys.argv[i + 1] is not None:
sqlfile = sys.argv[i + 1]
except Exception:
print('Please enter the sqlfile pathrequired true. eg: /f sql.txt')
sys.exit(0)
elif sys.argv[i] == '/o':
try:
if sys.argv[i + 1] is not None:
dataLineageFileType = sys.argv[i + 1]
except Exception:
dataLineageFileType = 'json'
token = getToken(userId, server, port, screctKey);
# sqlflow job name
jobName = 'test'
jobId = toSqlflow(userId, token, server, port, jobName, dbvendor, sqlfile)
while 1==1:
status = getStatus(userId, token, server, port, jobId)
if status == 'fail':
print('job execute failed.')
break;
elif status == 'success':
print('job execute successful.')
break;
elif status == 'partial_success':
print('job execute partial successful.')
break;
time.sleep(2)
# data lineage file path
filePath = 'datalineage'
getResult(dataLineageFileType, userId, token, server, port, jobId, filePath)

View File

@ -0,0 +1,57 @@
import zipfile
import sys
import os
def toZip(start_dir):
if start_dir.endswith(os.sep):
start_dir = start_dir[:-1]
start_dir = start_dir
file_news = start_dir + '.zip'
z = zipfile.ZipFile(file_news, 'w', zipfile.ZIP_DEFLATED)
for dir_path, dir_names, file_names in os.walk(start_dir):
f_path = dir_path.replace(start_dir, '')
f_path = f_path and f_path + os.sep or ''
for filename in file_names:
z.write(os.path.join(dir_path, filename), f_path + filename)
z.close()
return file_news
def buildSqltextParam(userId, token, delimiter, export_include_table, showConstantTable,
treatArgumentsInCountFunctionAsDirectDataflow, dbvendor, sqltext):
data = {'dbvendor': dbvendor, 'token': token, 'userId': userId}
if delimiter != '':
data['delimiter'] = delimiter
if export_include_table != '':
data['export_include_table'] = export_include_table
if showConstantTable != '':
data['showConstantTable'] = showConstantTable
if treatArgumentsInCountFunctionAsDirectDataflow != '':
data['treatArgumentsInCountFunctionAsDirectDataflow'] = treatArgumentsInCountFunctionAsDirectDataflow
if sqltext != '':
data['sqltext'] = sqltext
return data
def buildSqlfileParam(userId, token, delimiter, export_include_table, showConstantTable,
treatArgumentsInCountFunctionAsDirectDataflow, dbvendor, sqlfile):
files = ''
if sqlfile != '':
if os.path.isdir(sqlfile):
print('The SQL file cannot be a directory.')
sys.exit(0)
files = {'sqlfile': open(sqlfile, 'rb')}
data = {'dbvendor': dbvendor, 'token': token, 'userId': userId}
if delimiter != '':
data['delimiter'] = delimiter
if export_include_table != '':
data['export_include_table'] = export_include_table
if showConstantTable != '':
data['showConstantTable'] = showConstantTable
if treatArgumentsInCountFunctionAsDirectDataflow != '':
data['treatArgumentsInCountFunctionAsDirectDataflow'] = treatArgumentsInCountFunctionAsDirectDataflow
return data, files

View File

@ -0,0 +1,44 @@
import requests
import json
def getToken(userId, server, port, screctKey):
if userId == 'gudu|0123456789':
return 'token'
url = '/api/gspLive_backend/user/generateToken'
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/user/generateToken'
if port != '':
url = server + ':' + port + url
else:
url = server + url
mapA = {'secretKey': screctKey, 'userId': userId}
header_dict = {"Content-Type": "application/x-www-form-urlencoded"}
try:
r = requests.post(url, data=mapA, headers=header_dict, verify=False)
except Exception as e:
print('get token failed.', e)
result = json.loads(r.text)
if result['code'] == '200':
return result['token']
else:
print(result['error'])
if __name__ == '__main__':
server = ''
port = ''
# the user id of sqlflow web or client, required true
userId = ''
# the secret key of sqlflow user for webapi request, required true
screctKey = ''
token = getToken(userId, server, port, screctKey)
print(token)

View File

@ -0,0 +1,61 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import json
import GenerateToken
def check(server, port, sql, dbvendor, userId, token):
url = "/api/gspLive_backend/demo/syntax/check"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/demo/syntax/check'
if port != '':
url = server + ':' + port + url
else:
url = server + url
data = {'sql': sql, 'dbvendor': dbvendor, 'userId': userId, 'token': token}
header_dict = {"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"}
try:
r = requests.post(url, data=data, headers=header_dict, verify=False)
except Exception as e:
print('syntax error.', e)
result = json.loads(r.text)
if result['code'] == 200:
usedTime = result['data']['usedTime']
version = result['data']['gsp.version']
print('syntax correct. elapsed time: ' + usedTime+' ,gsp version: ' + version)
else:
usedTime = result['data']['usedTime']
version = result['data']['gsp.version']
print('syntax error. elapsed time: ' + usedTime + ' ,gsp version: ' + version + ' ,error info:')
errorInfos = result['data']['errorInfos']
for error in errorInfos:
print(error['errorMessage'])
if __name__ == '__main__':
# the user id of sqlflow web or client, required true
userId = ''
# the secret key of sqlflow user for webapi request, required true
screctKey = ''
# sqlflow server, For the cloud version, the value is https://api.gudusoft.com
server = 'https://api.gudusoft.com'
# sqlflow api port, For the cloud version, the value is 80
port = ''
# The token is generated from userid and usersecret. It is used in every Api invocation.
token = GenerateToken.getToken(userId, server, port, screctKey)
# sql to be checked
sql = 'select * fro1m table1'
# database type, dbvansi,dbvathena,dbvazuresql,dbvbigquery,dbvcouchbase,dbvdb2,dbvgreenplum,dbvgaussdb,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpresto,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsparksql,dbvsybase,dbvteradata,dbvvertica
dbvendor = 'dbvoracle'
# check syntax
check(server, port, sql, dbvendor, userId, token)

105
api/python/basic/getcsv.py Normal file
View File

@ -0,0 +1,105 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import json
import sys
import GenerateToken
import GenerateLineageParam
def getResult(server, port, data, files):
url = "/api/gspLive_backend/sqlflow/generation/sqlflow/exportFullLineageAsCsv"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/sqlflow/generation/sqlflow/exportFullLineageAsCsv'
if port != '':
url = server + ':' + port + url
else:
url = server + url
datastr = json.dumps(data)
print('start get csv result from sqlflow.')
try:
if files != '':
response = requests.post(url, data=eval(datastr), files=files, verify=False)
else:
response = requests.post(url, data=eval(datastr), verify=False)
except Exception as e:
print('get csv result from sqlflow failed.', e)
sys.exit(0)
print('get csv result from sqlflow successful. result : ')
print()
return response.text
if __name__ == '__main__':
# the user id of sqlflow web or client, required true
userId = ''
# the secret key of sqlflow user for webapi request, required true
screctKey = ''
# sqlflow server, For the cloud version, the value is https://api.gudusoft.com
server = 'http://127.0.0.1'
# sqlflow api port, For the cloud version, the value is 443
port = '8165'
# For the cloud version
# server = 'https://api.gudusoft.com'
# port = '80'
# The token is generated from userid and usersecret. It is used in every Api invocation.
token = GenerateToken.getToken(userId, server, port, screctKey)
# delimiter of the values in CSV, default would be ',' string
delimiter = ','
# export_include_table, string
export_include_table = ''
# showConstantTable, boolean
showConstantTable = 'true'
# Whether treat the arguments in COUNT function as direct Dataflow, boolean
treatArgumentsInCountFunctionAsDirectDataflow = ''
# database type,
# dbvazuresql
# dbvbigquery
# dbvcouchbase
# dbvdb2
# dbvgreenplum
# dbvhana
# dbvhive
# dbvimpala
# dbvinformix
# dbvmdx
# dbvmysql
# dbvnetezza
# dbvopenedge
# dbvoracle
# dbvpostgresql
# dbvredshift
# dbvsnowflake
# dbvmssql
# dbvsparksql
# dbvsybase
# dbvteradata
# dbvvertica
dbvendor = 'dbvoracle'
# sql text
# sqltext = 'select * from table'
# data = GenerateLineageParam.buildSqltextParam(userId, token, delimiter, export_include_table, showConstantTable, treatArgumentsInCountFunctionAsDirectDataflow, dbvendor, sqltext)
# resp = getResult(server, port, data, '')
# sql file
sqlfile = 'test.sql'
data, files = GenerateLineageParam.buildSqlfileParam(userId, token, delimiter, export_include_table,
showConstantTable,
treatArgumentsInCountFunctionAsDirectDataflow, dbvendor,
sqlfile)
resp = getResult(server, port, data, files)
print(resp)

104
api/python/basic/readme.md Normal file
View File

@ -0,0 +1,104 @@
## Python Data lineage: using the SQLFlow REST API (Basic)
A basic tutorial for using the Python version of the SQLFlow API.
Here is an advanced version of how to use [Python to get the data lineage](https://github.com/sqlparser/sqlflow_public/tree/master/api/python/advanced).
### Prerequisites
- Python 2.7 or higher version must be installed and configured correctly.
- Installing Dependency Libraries:
`
pip install requests
`
### GenerateTokenDemo.py
This demo shows how to get a token from a SQLFlow system that can be used to legally call other interfaces.
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **userSecret**: the userSecret of sqlflow client request. sqlflow web, required false, sqlflow client, required true
This is the user id that is used to connect to the SQLFlow server.
Always set this value to `gudu|0123456789` and keep `userSecret` empty if you use the SQLFlow on-premise version.
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
**set the parameters in the code**
Connect to the SQLFlow Cloud Server:
````json
url = 'https://api.gudusoft.com/gspLive_backend/user/generateToken'
userId = 'YOUR USER ID'
screctKey = 'YOUR SECRET KEY'
````
Connect to the SQLFlow on-premise version:
````json
url = 'http://127.0.0.1:8081/gspLive_backend/user/generateToken'
userId = 'gudu|012345678'
screctKey = ''
````
**start script**
`python GenerateTokenDemo.py`
### GenerateDataLineageDemo.py
This demo shows how to get the desired SQL script analysis results from the SQLFlow system.
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **userSecret**: the userSecret of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* sqltext: sql text, required false
* sqlfile: sql file, required false
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* filePath: data lineage file path
**set the parameters in the code**
Connect to the SQLFlow Cloud Server:
````json
tokenUrl = 'https://api.gudusoft.com/gspLive_backend/user/generateToken'
generateDataLineageUrl = 'https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow'
userId = 'YOUR USER ID'
screctKey = 'YOUR SECRET KEY'
sqlfile = 'test.sql'
dbvendor = 'dbvoracle'
filePath = 'datalineage'
````
Connect to the SQLFlow on-premise version:
````json
tokenUrl = 'http://127.0.0.1:8081/gspLive_backend/user/generateToken'
generateDataLineageUrl = 'http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow'
userId = 'gudu|012345678'
screctKey = ''
sqlfile = 'test.sql'
dbvendor = 'dbvoracle'
filePath = 'datalineage'
````
**start script**
cmd:
- /f. the sqlfile pathrequired. eg: /f sql.txt
- /o. the data lineage file type. default value is json, optional. eg: /o csv , /o json
eg:
`python GenerateDataLineageDemo.py /f test.sql /o csv`

59
api/python/basic/toxml.py Normal file
View File

@ -0,0 +1,59 @@
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import json
import GenerateToken
def toxml(server, port, sql, dbvendor, userId, token):
url = "/api/gspLive_backend/demo/xml/toXML"
if 'api.gudusoft.com' in server:
url = '/gspLive_backend/demo/xml/toXML'
if port != '':
url = server + ':' + port + url
else:
url = server + url
data = {'sql': sql, 'dbvendor': dbvendor, 'userId': userId, 'token': token}
header_dict = {"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"}
try:
r = requests.post(url, data=data, headers=header_dict, verify=False)
except Exception as e:
print('convert failed.', e)
result = json.loads(r.text)
usedTime = result['data']['usedTime']
version = result['data']['gsp.version']
if result['code'] == 200:
xml = result['data']['xml']
print('elapsed time: ' + usedTime+' ,gsp version: ' + version + ' ,xml result: ')
print(xml)
else:
print('to xml failed. elapsed time: ' + usedTime + ' ,gsp version: ' + version + ' ,error info: ')
print(result['error'])
if __name__ == '__main__':
# the user id of sqlflow web or client, required true
userId = ''
# the secret key of sqlflow user for webapi request, required true
screctKey = ''
# sqlflow server, For the cloud version, the value is https://api.gudusoft.com
server = 'https://api.gudusoft.com'
# sqlflow api port, For the cloud version, the value is 80
port = ''
# The token is generated from userid and usersecret. It is used in every Api invocation.
token = GenerateToken.getToken(userId, server, port, screctKey)
# sql to be checked
sql = 'select * from table1'
# database type, dbvansi,dbvathena,dbvazuresql,dbvbigquery,dbvcouchbase,dbvdb2,dbvgreenplum,dbvgaussdb,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpresto,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsparksql,dbvsybase,dbvteradata,dbvvertica
dbvendor = 'dbvoracle'
# to xml
toxml(server, port, sql, dbvendor, userId, token)

View File

@ -0,0 +1,163 @@
## THIS VERSION IS DEPRECIATED, PLEASE USE THE CODE IN THE BASIC OR ADAVANCED DIRECTORY
========================================================================================================================================================================================================
SQLFlow API Python Client Documentation
========================================================================================================================================================================================================
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DESCRIPTION
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
High-level Python client of the SQLFlow API.
SQLFlow is a product of Gudusoft. The software's purpose is to analyze the flow of data, data relationships and dependencies coded into various SQL scripts.
This Python wrapper is built to process SQL scripts using the API with the option to export the API responses into JSON files.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BASIC USAGE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The Python client is built into a single module. To use it, one must have a valid API key (currently available for the community at https://github.com/sqlparser/sqlflow_public/tree/master/api/client/csharp).
****************************************************************************************************
SQLFlowClient(api_key, api_url) class stores relevant parameters and methods to utilize SQLFlow API.
It has all the default values included for both the API key (which is currently available to the public) and the API base URL.
Initializig it will create an object with the following variables: API key, API URL, and it will also initialize the default request header and a default API parameter configuration.
****************************************************************************************************
configure_api(db_vendor, rel_type, simple_output, ignore_rs) method is created to change default API parameters as per required. It will change the pre-set API configuration based to provided parameter values.
Detailed explanations regarding API configuration could be found here: https://github.com/sqlparser/sqlflow_public/tree/master/api/client/csharp and here: https://api.gudusoft.com/gspLive_backend/swagger-ui.html#!/sqlflow-controller/generateSqlflowUsingPOST.
While using the method, one must provide all four parameters. Omitting one will result in error, while passing an invalid value will result in a notification and both will prevent the client from configuring the API request, and a notification message will be returned.
Valid parameters are as follows:
- db_vendor: dbvbigquery, dbvcouchbase, dbvdb2, dbvgreenplum, dbvhana, dbvhive, dbvimpala, dbvinformix, dbvmdx, dbvmysql, dbvnetezza, dbvopenedge, dbvoracle, dbvpostgresql, dbvredshift, dbvsnowflake, dbvmssql, bvsybase, dbvteradata, dbvvertica
- rel_type: fdd, fdr, frd, fddi, join
- simple_output: true, false
- ignore_rs: true, false
****************************************************************************************************
analyze_script(script_path) method can be used to submit a SQL script to the SQLFlow API for analysis. If the analysis returns a response sucessfully, the results will be stored in the SQLFlowClient object's results variable. Results variable is a dictionary object containing script paths and API responses as key-value pairs.
The method won't perform if the in-built check of the provided file path is not pointing to a SQL script. This will result in a notification message instead.
If the API call results in an error (e.g. invalid API key, server being busy), the response won't be stored, but a notification message will be returned instead.
****************************************************************************************************
export_results(export_folder) method simply dumps all the API call results stored already in SQLFlowClient's results variable to the specified output folder path.
The API responses will be saved as JSON files, with a filename corresponding to their source scripts'.
If the provided path doesn't exist, the method will automatically build the path.
If there are no stored responses yet, the function won't perform, and will return a notification message.
****************************************************************************************************
mass_process_scripts(source_folder, export_folder = None) method will scan the entire directory tree of the provided source folder for SQL script files and submits each to the API, storing all the responses in the results variable.
It can optionally export the results of the detected scripts to a desired export folder. If export_folder is left as None, this operation will be skipped.
Please note that this method will only execute the exporting of API results of scripts which were discovered in the specified directory at the function's execution.
****************************************************************************************************
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CODE EXAMPLES
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Initialize API client
client = SQLFlowClient()
# =============================================================================
# Configure the API parameters
client.configure('dbvmssql', 'fddi', 'false', 'false')
# Check config values after setting the parameters
print(client.config)
# =============================================================================
# Execute the analysis of a single script file
client.analyze_script('C:/Users/TESTUSER/Desktop/EXAMPLESCRIPT.sql')
# Check stored API response of the previous step
print(client.results)
# =============================================================================
# Export the stored response
client.export_results('C:/Users/TESTUSER/Desktop/EXPORTFOLDER')
# =============================================================================
# Execute mass processing of SQL scripts in a folder with an export folder specified
client.mass_process_scripts('C:/Users/TESTUSER/Desktop/SOURCEFOLDER', 'C:/Users/TESTUSER/Desktop/EXPORTFOLDER')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AUTHORS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bence Kiss (vencentinus@gmail.com)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ADDITIONAL INFORMATION
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Detailed information about the SQLFlow project could be accessed via the following links:
API configuration https://api.gudusoft.com/gspLive_backend/swagger-ui.html#!/sqlflow-controller/generateSqlflowUsingPOST
SQLFlow Git repo https://github.com/sqlparser/sqlflow_public
Dataflow relationship types https://github.com/sqlparser/sqlflow_public/blob/master/dbobjects_relationship.md
SQLFlow front end http://www.gudusoft.com/sqlflow/#/
C# API client https://github.com/sqlparser/sqlflow_public/tree/master/api/client/csharp
In case of any questions regarding SQLFlow please contact Mr. James Wang at info@sqlparser.com.
In case of bugs, comments, questions etc. please feel free to contact the author at vencentinus@gmail.com or Mr. James Wang at info@sqlparser.com.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ACKNOWLEDGEMENTS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The author of this project acknowledges that SQLFlow is a product and intellectual property exclusively of Gudusoft.
This project has been created to facilitate the utilization of the tool by the community, and the author of this Python client neither received nor expects to receive any compensation from Gudusoft in exchange.
This development has been created with good faith and with the intention to contribute to a great development, which the author of this wrapper has been utilizing for free under its development period.
The code is free to use for anyone intending to use SQLFlow API in any manner.
Thanks to Mr. James Wang, CTO of Gudusoft for his kind support and allowing me to utilize the tool under it's development and contribute to his company's project.

View File

@ -0,0 +1,371 @@
'''
************************************************************************************************************************************************************
Properties
================
NAME: SQLFlow API Python Client
DESCRIPTION: A simple wrapper written for Gudusoft's SQLFlow API.
AUTHOR: Bence Kiss
ORIGIN DATE: 21-MAR-2020
PYTHON VERSION: 3.7.3
Additional Notes
================
-
ADDITIONAL INFORMATION
============================================================================================================================================================
Resources URL
============================== ============================================================================================================================
API configuration https://api.gudusoft.com/gspLive_backend/swagger-ui.html#!/sqlflow-controller/generateSqlflowUsingPOST
------------------------------ ----------------------------------------------------------------------------------------------------------------------------
SQLFlow Git repo https://github.com/sqlparser/sqlflow_public
------------------------------ ----------------------------------------------------------------------------------------------------------------------------
Dataflow relationship types https://github.com/sqlparser/sqlflow_public/blob/master/dbobjects_relationship.md
------------------------------ ----------------------------------------------------------------------------------------------------------------------------
SQLFlow front end http://www.gudusoft.com/sqlflow/#/
------------------------------ ----------------------------------------------------------------------------------------------------------------------------
C# API client https://github.com/sqlparser/sqlflow_public/tree/master/api/client/csharp
------------------------------ ----------------------------------------------------------------------------------------------------------------------------
REVISION HISTORY
============================================================================================================================================================
Version Change Date Author Narrative
======= =============== ====== ============================================================================================================================
1.0.0 21-MAR-2020 BK Created
------- --------------- ------ ----------------------------------------------------------------------------------------------------------------------------
0.0.0 DD-MMM-YYYY XXX What changed and why...
------- --------------- ------ ----------------------------------------------------------------------------------------------------------------------------
************************************************************************************************************************************************************
'''
# ==========================================================================================================================================================
# Import required modules
import os
import requests
import json
# ==========================================================================================================================================================
class SQLFlowClient:
'''
Class description
------------------------------------------------------------------------------------------------------------------------------------------------------------
Class containing various functions to use SQLFlow API.
Class instance variables
------------------------------------------------------------------------------------------------------------------------------------------------------------
- api_key: The token needed for authorization. Default public token can be found here:
https://github.com/sqlparser/sqlflow_public/tree/master/api/client/csharp
- api_url: Default base URL of the API requests. Can be changed at class initialization.
Class methods
------------------------------------------------------------------------------------------------------------------------------------------------------------
- configure_api: Set the API parameters for the requests.
- analyze_script: Submit a single SQL script using POST request to the API. Responses are stored in the class instance's results variable.
- export_responses: Export all stored API responses to a target folder as JSON files.
- mass_process_scripts: Process all SQL scripts found in a directory tree, optionally exporting results to a designated folder.
Class dependencies
------------------------------------------------------------------------------------------------------------------------------------------------------------
Packages used in the script considered to be core Python packages.
- os: Used to handle input/output file and folder paths.
- requests: Used to generate POST requests and submit script files to the API.
- json: Used to process API responses when it comes to exporting.
'''
# ==========================================================================================================================================================
# ==========================================================================================================================================================
def __init__(self,
api_key = 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYwMzc1NjgwMCwiaWF0IjoxNTcyMjIwODAwfQ.EhlnJO7oqAHdr0_bunhtrN-TgaGbARKvTh2URTxu9iU',
api_url = 'https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow'
):
'''
------------------------------------------------------------------------------------------------------------------------------------------------------------
Initialize SQLFlow API client.
------------------------------------------------------------------------------------------------------------------------------------------------------------
'''
# Set instance variables
self.key = api_key
self.url = api_url
# Set default request header
self.headers = {'Accept': 'application/json;charset=utf-8',
'Authorization': self.key
}
# =============================================================================
# Set lists of allowed API configuration values
# List of allowed database vendors
self.dbvendors = ['dbvbigquery',
'dbvcouchbase',
'dbvdb2',
'dbvgreenplum',
'dbvhana',
'dbvhive',
'dbvimpala',
'dbvinformix',
'dbvmdx',
'dbvmysql',
'dbvnetezza',
'dbvopenedge',
'dbvoracle',
'dbvpostgresql',
'dbvredshift',
'dbvsnowflake',
'dbvmssql',
'dbvsybase',
'dbvteradata',
'dbvvertica'
]
# List of allowed data relationship types
self.reltypes = ['fdd',
'fdr',
'frd',
'fddi',
'join'
]
# List of allowed values for Boolean parameters
self.switches = ['true',
'false'
]
# =============================================================================
# Set default API configuration
self.config = {'dbvendor': 'dbvmssql',
'showRelationType': 'fdd',
'simpleOutput': 'false',
'ignoreRecordSet': 'false'
}
# Variable to store API responses
self.results = dict()
# ==========================================================================================================================================================
# ==========================================================================================================================================================
def configure_api(self,
db_vendor,
rel_type,
simple_output,
ignore_rs
):
'''
------------------------------------------------------------------------------------------------------------------------------------------------------------
Configure the API request parameters. Only works if all parameters are provided.
------------------------------------------------------------------------------------------------------------------------------------------------------------
'''
# Check if the provided configuration values are valid
if db_vendor in self.dbvendors and rel_type in self.reltypes and simple_output in self.switches and ignore_rs in self.switches:
# Assign valid configuration parameters to config variable
self.config = {'dbvendor': db_vendor,
'showRelationType': rel_type,
'simpleOutput': simple_output,
'ignoreRecordSet': ignore_rs
}
# If any of the provided parameters are invalid, quit function and notify user
else:
print('\n\n' + '=' * 75 + '\n\nOne or more configuration values are missing or invalid. Please try again.\n\nAllowed values for db_vendor:\n\n' +
' / '.join(self.dbvendors) +
'\n\nAllowed values for relation_type:\n\n' +
' / '.join(self.reltypes) +
'\n\nAllowed values for simple_output and ignore_rs:\n\n' +
' / '.join(self.switches) +
'\n\n' + '=' * 75
)
# ==========================================================================================================================================================
# ==========================================================================================================================================================
def analyze_script(self,
script_path
):
'''
------------------------------------------------------------------------------------------------------------------------------------------------------------
Submit SQL script file for SQLFlow analysis.
------------------------------------------------------------------------------------------------------------------------------------------------------------
'''
# Compile the API request URL
configuredURL = self.url + '?' + ''.join(str(parameter) + '=' + str(setting) + '&' for parameter, setting in self.config.items()).rstrip('&')
# =============================================================================
# Check if provided path points to a SQL script file
if os.path.isfile(script_path) and script_path.lower().endswith('.sql'):
# Open the script file in binary mode so it could be submitted in a POST request
with open(script_path, mode = 'rb') as scriptFile:
# Use requests module's POST function to submit file and retrieve API response
response = requests.post(configuredURL, files = {'sqlfile': scriptFile}, headers = self.headers)
# =============================================================================
# Add the request response to the class variable if response was OK
if response.status_code == 200:
self.results[script_path] = json.loads(response.text)
# If response returned a different status, quit function and notify user
else:
print('\nAn invalid response was returned for < ' + os.path.basename(script_path) + ' >.\n', '\nStatus code: ' + str(response.status_code) + '\n')
# If script file's path is invalid, quit function and notify user
else:
print('\nProvided path is not pointing to a SQL script file. Please try again.\n')
# ==========================================================================================================================================================
# ==========================================================================================================================================================
def export_results(self,
export_folder
):
'''
------------------------------------------------------------------------------------------------------------------------------------------------------------
Export all stored API responses as JSON files to a specified folder.
------------------------------------------------------------------------------------------------------------------------------------------------------------
'''
# Check if there are responses to be exported
if len(self.results) != 0:
# Create the directory for the result files if it doesn't exist
os.makedirs(export_folder, exist_ok = True)
# =============================================================================
# Iterate the API results stored in the class
for scriptpath, response in self.results.items():
# Create a JSON file and export API results of each processed script file into the JSON file
with open(os.path.join(export_folder, os.path.basename(scriptpath).replace('.sql', '') + '.json'), mode = 'w') as resultFile:
# Write the response into the JSON file
json.dump(response, resultFile)
# If there are no responses yet, quit function and notify user
else:
print('\nThere are no API responses stored by the client yet.\n')
# ==========================================================================================================================================================
# ==========================================================================================================================================================
def mass_process_scripts(self,
source_folder,
export_folder = None):
'''
------------------------------------------------------------------------------------------------------------------------------------------------------------
Scan a directory tree for SQL script files and pass each to an API call. Optionally export results to a desired folder.
------------------------------------------------------------------------------------------------------------------------------------------------------------
'''
# List to store SQL script file paths found in source folder
scriptPaths = list()
# =============================================================================
# Scan source folder and subfolders
for (dirpath, dirnames, filenames) in os.walk(source_folder):
# Collect all paths which refer SQL scripts
scriptPaths += [os.path.join(dirpath, file) for file in filenames if os.path.isfile(os.path.join(dirpath, file)) and file.lower().endswith('.sql')]
# =============================================================================
# If there is at least one SQL script in the directory tree execute API call
if len(scriptPaths) != 0:
# Iterate the SQL scrip paths and call the API for each file
[self.analyze_script(script_path = path) for path in scriptPaths]
# =============================================================================
# If an export folder is provided, save the responses to that folder (but only those which have been analyzed at function call)
if export_folder:
# Store the current set of API responses
allResults = self.results
# Filter for responses related to current function call
self.results = {scriptpath: response for scriptpath, response in self.results.items() if scriptpath in scriptPaths}
# Export the responses of the current function call to the desired target folder
self.export_results(export_folder = export_folder)
# Reset the results variable to contain all responses again
self.results = allResults
# If no SQL script files were found in the directory tree, quit finction and notify user
else:
print('\nNo SQL script files have been found in the specified source folder and its subfolders.\n')

121
api/readme.md Normal file
View File

@ -0,0 +1,121 @@
## How to use the Rest API of SQLFlow
This article describes how to use the Rest API provided by the SQLFlow to
communicate with the SQLFlow server and get the generated metadata and data lineage.
In this article, we use `Curl` to demonstrate the usage of the Rest API,
you can use any preferred programming language as you like.
### Prerequisites
In order to use the SQLFlow rest API, you may connect to the [**SQLFlow Cloud server**](https://sqlflow.gudusoft.com),
Or, setup a [**SQLFlow on-premise version**](https://www.gudusoft.com/sqlflow-on-premise-version/) on your owner server.
1. **SQLFlow Cloud server**
- User ID
- Secrete Key
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
2. **SQLFlow on-premise version**
Please [check here](https://github.com/sqlparser/sqlflow_public/blob/master/install_sqlflow.md) to see how to install SQLFlow on-premise version on you own server.
- User ID
- Secrete Key
Always set userId to `gudu|0123456789` and keep `userSecret` empty when connect to the SQLFlow on-premise version.
### Difference of the API calls between SQLFlow Cloud server and SQLFlow on-premise version
1. TOKEN is not needed in the API calls when connect to the SQLFlow on-premise version
2. userId alwyas set to `gudu|0123456789` and `userSecret` leave empty when connect to the SQLFlow on-premise version.
3. The server port is 8081 by default for the SQLFlow on-premise version, and There is no need to specify the port when connect to the SQLFlow Cloud server.
Regarding the server port of the SQLFlow on-premise version, please [check here](https://github.com/sqlparser/sqlflow_public/tree/master/grabit#1-sqlflow-server) for more information.
### Using the Rest API
#### 1. Generate a token
Once you have the `userid` and `secret key`, the first API need to call is:
```
/gspLive_backend/user/generateToken
```
This API will return a temporary token that needs to be used in the API call thereafter.
**SQLFlow Cloud Server**
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/user/generateToken" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:application/x-www-form-urlencoded;charset=UTF-8" -d "secretKey=YOUR SECRET KEY" -d "userId=YOUR USER ID HERE"
```
**SQLFlow on-premise version**
TOKEN is not needed in the on-premise version. So, there is no need to generate a token.
#### 2. Generate the data lineage
Call this API by sending the SQL query and get the result includes the data lineage.
```
/gspLive_backend/sqlflow/generation/sqlflow
```
**SQLFlow Cloud Server**
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow?showRelationType=fdd" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "sqlfile=" -F "dbvendor=dbvoracle" -F "ignoreRecordSet=false" -F "simpleOutput=false" -F "sqltext=CREATE VIEW vsal as select * from emp" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE"
```
**SQLFlow on-premise version**
```
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow?showRelationType=fdd" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "sqlfile=" -F "dbvendor=dbvoracle" -F "ignoreRecordSet=false" -F "simpleOutput=false" -F "sqltext=CREATE VIEW vsal as select * from emp" -F "userId=gudu|0123456789"
```
#### 3. Export the data lineage in csv format
Call this API by sending the SQL file and get the csv result includes the data lineage.
```
/gspLive_backend/sqlflow/generation/sqlflow/exportLineageAsCsv
```
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow/exportLineageAsCsv" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "dbvendor=dbvoracle" -F "showRelationType=fdd" -F "sqlfile=@YOUR UPLOAD FILE PATH HERE" --output YOUR DOWNLOAD FILE PATH HERE
```
Sample:
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow/exportLineageAsCsv" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=auth0|5fc8e95991a780006f180d4d" -F "token=YOUR TOKEN HERE" -F "dbvendor=dbvoracle" -F "showRelationType=fdd" -F "sqlfile=@c:\prg\tmp\demo.sql" --output c:\prg\tmp\demo.csv
```
**Note:**
* -H "Content-Type:multipart/form-data" is required.
* Add **@** before the upload file path
* --output is required.
* Optional, if you just want to fetch table to table relations, please add **-F "tableToTable=true"**
#### 4. Submit multiple SQL files and get the data lineage in CSV, JSON, graphml format.
<a href="sqlflow-job-api-tutorial.md">Rest APIs: Job</a>
### The full reference to the Rest APIs
[SQLFlow rest API reference](sqlflow_api.md)
### Troubleshooting
- Under windows, you may need to add option `--ssl-no-revoke` to avoid some security issues, `curl --ssl-no-revoke`

View File

@ -0,0 +1,255 @@
- [SQLFlow Job API tutorial](#sqlflow-job-api-tutorial)
* [1. Prerequisites](#1-prerequisites)
+ [Difference of the API calls between SQLFlow Cloud server and SQLFlow on-premise version](#difference-of-the-api-calls-between-sqlflow-cloud-server-and-sqlflow-on-premise-version)
+ [Generate a token](#generate-a-token)
* [2. Different type of Job](#2-different-type-of-job)
* [3. Simple job rest API](#3-simple-job-rest-api)
+ [1. Submit a sqlflow job](#1-submit-a-sqlflow-job)
+ [2. Get job status](#2-get-job-status)
+ [3. Export data lineage](#3-export-data-lineage)
* [4. Regular job rest API](#4-regular-job-rest-api)
## SQLFlow Job API tutorial
This article describes how to use the Job Rest API provided by the SQLFlow to
communicate with the SQLFlow server and export the data lineage in json, csv, graphml formats.
### 1. Prerequisites
In order to use the SQLFlow rest API, you may connect to the [**SQLFlow Cloud server**](https://sqlflow.gudusoft.com),
Or, setup a [**SQLFlow on-premise version**](https://www.gudusoft.com/sqlflow-on-premise-version/) on your owner server.
1. **SQLFlow Cloud server**
- User ID
- Secrete Key
If you want to connect to [the SQLFlow Cloud Server](https://sqlflow.gudusoft.com), you may [request a 30 days premium account](https://www.gudusoft.com/request-a-premium-account/) to
[get the necessary userId and secret code](/sqlflow-userid-secret.md).
2. **SQLFlow on-premise version**
Please [check here](https://github.com/sqlparser/sqlflow_public/blob/master/install_sqlflow.md) to see how to install SQLFlow on-premise version on you own server.
- User ID
- Secrete Key
Always set userId to `gudu|0123456789` and keep `userSecret` empty when connect to the SQLFlow on-premise version.
#### Difference of the API calls between SQLFlow Cloud server and SQLFlow on-premise version
1. TOKEN is not needed in the API calls when connect to the SQLFlow on-premise version
2. userId alwyas set to `gudu|0123456789` and `userSecret` leave empty when connect to the SQLFlow on-premise version.
3. The server port is 8081 by default for the SQLFlow on-premise version, and There is no need to specify the port when connect to the SQLFlow Cloud server.
Regarding the server port of the SQLFlow on-premise version, please [check here](https://github.com/sqlparser/sqlflow_public/tree/master/grabit#1-sqlflow-server) for more information.
#### Generate a token
Once you have the `userid` and `secret key`, the first API need to call is:
```
/gspLive_backend/user/generateToken
```
This API will return a temporary token that needs to be used in the API call thereafter.
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/user/generateToken" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:application/x-www-form-urlencoded;charset=UTF-8" -d "secretKey=YOUR SECRET KEY" -d "userId=YOUR USER ID HERE"
```
More detail, please see https://github.com/sqlparser/sqlflow_public/edit/master/api/readme.md
### 2. Different type of Job
![SQLFlow job types](job-types.png)
### 3. Simple job rest API
#### 1. Submit a sqlflow job
Call this API by sending the SQL files and get the result includes the data lineage. SQLFlow job supports both of multiple files and zip archive file.
```
/gspLive_backend/sqlflow/job/submitUserJob
```
Example in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/submitUserJob" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "sqlfiles=@FIRST FILE PATH" -F "sqlfiles=@SECOND FILE PATH" -F "dbvendor=dbvmssql" -F "jobName=job1"
```
**Note:**
* **-H "Content-Type:multipart/form-data"** is required
* Add **@** before the file path
Return data:
```json
{
"code":200,
"data":{
"jobId":"c359aef4bd9641d697732422debd8055",
"jobName":"job1",
"userId":"google-oauth2|104002923119102769706",
"dbVendor":"dbvmssql",
"dataSource":{
},
"fileNames":["1.sql","1.zip"],
"createTime":"2020-12-15 15:14:39",
"status":"create"
}
}
```
Please records the jobId field.
#### 2. Get job status
* Get the specify user job status and summary
```
/gspLive_backend/sqlflow/job/displayUserJobSummary
```
Example in `Curl`
```json
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/displayUserJobSummary" -F "jobId=c359aef4bd9641d697732422debd8055" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE"
```
Return data:
```json
{
"code":200,
"data":{
"jobId":"c359aef4bd9641d697732422debd8055",
"jobName":"job1",
"userId":"google-oauth2|104002923119102769706",
"dbVendor":"dbvmssql",
"dataSource":{
},
"fileNames":["1.sql","1.zip"],
"createTime":"2020-12-15 15:14:39",
"status":"success",
"sessionId":"fe5898d4e1b1a7782352b50a8203ca24c04f5513446e9fb059fc4d584fab4dbf_1608045280033"
}
}
```
* Get all jobs (include history jobs) status and summary
```
/gspLive_backend/sqlflow/job/displayUserJobsSummary
```
Example in `Curl`
```json
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/displayUserJobsSummary" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE"
```
#### 3. Export data lineage
When the job status is **success**, you can export the data lineage in json, csv, graphml formats
* 3.1 Export data lineage in json format
```
/gspLive_backend/sqlflow/job/exportLineageAsJson
```
Example in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/exportLineageAsJson" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "jobId=c359aef4bd9641d697732422debd8055" --output lineage.json
```
**Note:**
> If you want to get table to table relation, please add option -F "tableToTable=true"
* 3.2 Export data lineage in csv format
```
/gspLive_backend/sqlflow/job/exportFullLineageAsCsv
```
Example in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/exportFullLineageAsCsv" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "jobId=c359aef4bd9641d697732422debd8055" --output lineage.csv
```
**Note:**
> If you want to get table to table relation, please add option -F "tableToTable=true"
> If you want to change csv delimiter, please add option -F "delimiter=&lt;delimiter char&gt;"
* 3.3 Export data lineage in graphml format, you can view the lineage graph at yEd Graph Editor
```
/gspLive_backend/sqlflow/job/exportLineageAsGraphml
```
Example in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/exportLineageAsGraphml" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "jobId=c359aef4bd9641d697732422debd8055" --output lineage.graphml
```
**Note:**
> If you want to get table to table relation, please add option -F "tableToTable=true"
### 4. Regular job rest API
#### 1. Submit a regular job
Call this API by sending the SQL files and get the result includes the data lineage. SQLFlow job supports both of multiple files and zip archive file.
If the job is incremental, please set incremental=true
* first submit, jobId is null, and record the jobId field from response message
* second submit, jobId can't be null, please fill the jobId which returns by the first submit response.
```
/gspLive_backend/sqlflow/job/submitPersistJob
```
Example in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/submitPersistJob" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "sqlfiles=@FIRST FILE PATH" -F "sqlfiles=@SECOND FILE PATH" -F "dbvendor=dbvmssql" -F "jobName=job1" -F "incremental=true"
```
Incremental submit in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/submitPersistJob" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE" -F "sqlfiles=@FIRST FILE PATH" -F "sqlfiles=@SECOND FILE PATH" -F "dbvendor=dbvmssql" -F "jobName=job1" -F "incremental=true" -F "jobId=JobId OF FIRST SUBMIT"
```
**Note:**
* **-H "Content-Type:multipart/form-data"** is required
* Add **@** before the file path
Return data:
```json
{
"code":200,
"data":{
"jobId":"c359aef4bd9641d697732422debd8055",
"jobName":"job1",
"userId":"google-oauth2|104002923119102769706",
"dbVendor":"dbvmssql",
"dataSource":{
},
"fileNames":["1.sql","1.zip"],
"createTime":"2020-12-15 15:14:39",
"status":"create"
}
}
```
Please records the jobId field.

465
api/sqlflow_api.md Normal file
View File

@ -0,0 +1,465 @@
# SQLFlow WebAPI
## JWT Client API Authorization (for sqlflow client api call)
* All of the restful requests are based on JWT authorization. Before accessing the sqlflow WebAPI, client user needs to obtain the corresponding JWT token for legal access.
* How to get JWT Token
1. Login on [the sqlflow web](https://sqlflow.gudusoft.com), upgrade to premium account.
2. Move mouse on the login user image, select Account menu item, click the "generate" button to generate the user secret key.
3. When you get the user secret key, you can call **/gspLive_backend/user/generateToken** api to obtain a token, the ttl of new token is 24 hours.
4. **/gspLive_backend/user/generateToken**
* **userId**: the user id of sqlflow web or client, required **true**
* **secretKey**: the secret key of sqlflow user for webapi request, required **true**
* How to use JWT Token for security authentication?
* Each webapi contains two parameters, named userId and token.
## WebAPI
### Sqlflow Generation Interface
* **/sqlflow/generation/sqlflow/graph**
* Description: generate sqlflow model and graph
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: The token is only used when connecting to the SQLFlow Cloud server, not used when connect to the SQLFlow on-premise version.
* sqltext: sql text, optional
* sqlfile: sql file, optional
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* simpleOutput: simple output, ignore the intermediate results, defualt is false.
* ignoreRecordSet: same as simpleOutput, but will keep output of the top level select list, default is false.
* dataflowOfAggregateFunction, treat the dataflow generated by the aggregate function as direct dataflow or not ,default is direct.
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
* showConstantTable: return constant or not, default is false.
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* showRelationType: show relation type, optional, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **join**: combine rows from two or more tables, based on a related column between them
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/graph" -H "accept:application/json;charset=utf-8" -F "userId=your user id here" -F "token=your token here" -F "dbvendor=dbvoracle" -F "ignoreFunction=true" -F "ignoreRecordSet=true" -F "sqltext=select name from user"
```
* response:
```json
{
"code": 200,
"data": {
"mode": "global",
"summary": {
...
},
"sqlflow": {
"dbvendor": "dbvoracle",
"dbobjs": [
...
]
},
"graph": {
"elements": {
"tables": [
...
],
"edges": [
...
]
},
"tooltip": {},
"relationIdMap": {
...
},
"listIdMap": {
...
}
}
},
"sessionId": "6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
}
```
* **/sqlflow/generation/sqlflow/selectedgraph**
* Description: generate sqlflow model and selected dbobject graph
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **sessionId**: request sessionId, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* database: selected database, required false
* schema: selected schema, required false
* table: selected table, required false
* isReturnModel: whether return the sqlflow model, required false, default value is true
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether output relation simply, required false, default value is false
* ignoreRecordSet: whether ignore the record sets, required false, default value is false
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* session id: `6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051`
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/selectedgraph" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "dbvendor=dbvoracle" -F "ignoreFunction=true" -F "ignoreRecordSet=true" -F "isReturnModel=false" -F "sessionId=6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051" -F "table=user"
```
* response:
```json
{
"code": 200,
"data": {
"mode": "global",
"summary": {
...
},
"graph": {
"elements": {
"tables": [
...
],
"edges": [
...
]
},
"tooltip": {},
"relationIdMap": {
...
},
"listIdMap": {
...
}
}
},
"sessionId": "6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
}
```
* **/sqlflow/generation/sqlflow/getSelectedDbObjectInfo**
* Description: get the selected dbobject information, such as file information, sql index, dbobject positions, sql which contains selected dbobject.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **sessionId**: request sessionId, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* **coordinates**: the select dbobject positions, it's a json array string, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* session id: `6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051`
* coordinates: `[{'x':1,'y':8,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'},{'x':1,'y':12,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'}]`
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/getSelectedDbObjectInfo" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "coordinates=[{'x':1,'y':8,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'},{'x':1,'y':12,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'}]" -F "sessionId=6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
```
* response:
```json
{
"code": 200,
"data": [
{
"index": 0,
"positions": [
{
"x": 1,
"y": 8
},
{
"x": 1,
"y": 12
}
],
"sql": "select name from user"
}
]
}
```
### Sqlflow User Job Interface
* **/sqlflow/job/submitUserJob**
* Description: submit user job for multiple sql files, support zip file.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobName**: job name, required **true**
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* **sqlfiles**: request sql files, please use **multiple parts** to submit the sql files, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* curl command: **Note: please add **@** before the sql file path**
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/submitUserJob" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "sqlfiles=@D:/sql.txt" -F "dbvendor=dbvoracle" -F "jobName=job_test"
```
* response:
```json
{
"code": 200,
"data": {
"jobId": "6218721f092540c5a771ca8f82986be7",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"defaultDatabase": "",
"defaultSchema": "",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:11:28",
"status": "create"
}
}
```
* **/sqlflow/job/displayUserJobsSummary**
* Description: get the user jobs summary information.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/displayUserJobsSummary" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE"
```
* response:
```json
{
"code": 200,
"data": {
"total": 1,
"success": 1,
"partialSuccess": 0,
"fail": 0,
"jobIds": [
"bb996c1ee5b741c5b4ff6c2c66c371dd"
],
"jobDetails": [
{
"jobId": "bb996c1ee5b741c5b4ff6c2c66c371dd",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:16:11",
"status": "success",
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
]
}
}
```
* **/sqlflow/job/displayUserJobSummary**
* Description: get the specify user job information.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user jobs summary detail, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/displayUserJobSummary" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd"
```
* response:
```json
{
"code": 200,
"data": {
"total": 1,
"success": 1,
"partialSuccess": 0,
"fail": 0,
"jobIds": [
"bb996c1ee5b741c5b4ff6c2c66c371dd"
],
"jobDetails": [
{
"jobId": "bb996c1ee5b741c5b4ff6c2c66c371dd",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:16:11",
"status": "success",
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
]
}
}
```
* **/sqlflow/job/deleteUserJob**
* Description: delete the user job by job id.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user job detail, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/deleteUserJob" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd"
```
* response:
```json
{
"code": 200,
"data": {
"jobId": "bb996c1ee5b741c5b4ff6c2c66c371dd",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:16:11",
"status": "delete",
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
}
```
* **/sqlflow/job/displayUserJobGraph**
* Description: get the sqlflow job's model and graph
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user jobs summary detail, required **true**
* database: selected database, required false
* schema: selected schema, required false
* table: selected table, required false
* isReturnModel: whether return the sqlflow model, required false, default value is true
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether output relation simply, required false, default value is false
* ignoreRecordSet: whether ignore the record sets, required false, default value is false
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/displayUserJobGraph?showRelationType=fdd&showRelationType=" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd" -F "ignoreFunction=true" -F "ignoreRecordSet=true" -F "isReturnModel=false" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd" -F "table=user"
```
* response:
```json
{
"code": 200,
"data": {
"mode": "global",
"summary": {
...
},
"graph": {
"elements": {
"tables": [
...
],
"edges": [
...
],
},
"tooltip": {},
"relationIdMap": {
...
},
"listIdMap": {
...
}
}
},
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
```
* **/sqlflow/job/updateUserJobGraphCache**
* Description: update the user job graph cache, then user can call **/sqlflow/generation/sqlflow/selectedgraph** by sessionId, the sessionId value is from job detail.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user job detail, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/updateUserJobGraphCache" -H "Request-Origion:SwaggerBootstrapUi" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd"
```
* response:
```json
{
"code": 200,
"data": {
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
}
```
## Swagger
More information, please check the test environment swagger document:
* http://111.229.12.71:8081/gspLive_backend/doc.html?lang=en
* Token: `eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYwMzc1NjgwMCwiaWF0IjoxNTcyMjIwODAwfQ.EhlnJO7oqAHdr0_bunhtrN-TgaGbARKvTh2URTxu9iU`

517
api/sqlflow_api_full.md Normal file
View File

@ -0,0 +1,517 @@
# SQLFlow WebAPI
## JWT WEB Authorization (Only for sqlflow web)
* All of the restful requests are based on JWT authorization. Before accessing the sqlflow WebAPI, web user needs to obtain the corresponding JWT token for legal access.
* How to use JWT Token for security authentication?
* In the header of the HTTP request, please pass the parameters:
```
Key: Authorization
Value: Token <token>
```
## JWT Client API Authorization (for sqlflow client api call)
* All of the restful requests are based on JWT authorization. Before accessing the sqlflow WebAPI, client user needs to obtain the corresponding JWT token for legal access.
* How to get JWT Token
1. Login on the sqlflow web
2. Move mouse on the login user image, click the "generate token" menu item, you can get the user secret key and token, the ttl of token is 24 hours.
3. When you get the user secret key, you can call **/gspLive_backend/user/generateToken** api to refresh token, the ttl of new token is 24 hours.
4. **/gspLive_backend/user/generateToken**
* **userId**: the user id of sqlflow web or client, required **true**
* **secretKey**: the secret key of sqlflow user for webapi request, required **true**
* How to use JWT Token for security authentication?
* Each webapi contains two parameters, named userId and token.
## WebAPI
### Sqlflow Generation Interface
* **/sqlflow/generation/sqlflow**
* Description: generate sqlflow model
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* sqltext: sql text, required false
* sqlfile: sql file, required false
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether simple output relation, required false, default value is false
* ignoreRecordSet: whether ignore the record set, required false, default value is false
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "dbvendor=dbvoracle" -F "ignoreRecordSet=true" -F "sqltext=select name from user"
```
* response:
```json
{
"code": 200,
"data": {
"dbvendor": "dbvoracle",
"dbobjs": [
...
],
"relations": [
...
]
},
"sessionId": "6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501108040"
}
```
* **/sqlflow/generation/sqlflow/graph**
* Description: generate sqlflow model and graph
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* sqltext: sql text, required false
* sqlfile: sql file, required false
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether output relation simply, required false, default value is false
* ignoreRecordSet: whether ignore the record sets, required false, default value is false
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/graph" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "dbvendor=dbvoracle" -F "ignoreFunction=true" -F "ignoreRecordSet=true" -F "sqltext=select name from user"
```
* response:
```json
{
"code": 200,
"data": {
"mode": "global",
"summary": {
...
},
"sqlflow": {
"dbvendor": "dbvoracle",
"dbobjs": [
...
]
},
"graph": {
"elements": {
"tables": [
...
],
"edges": [
...
]
},
"tooltip": {},
"relationIdMap": {
...
},
"listIdMap": {
...
}
}
},
"sessionId": "6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
}
```
* **/sqlflow/generation/sqlflow/selectedgraph**
* Description: generate sqlflow model and selected dbobject graph
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **sessionId**: request sessionId, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* database: selected database, required false
* schema: selected schema, required false
* table: selected table, required false
* isReturnModel: whether return the sqlflow model, required false, default value is true
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether output relation simply, required false, default value is false
* ignoreRecordSet: whether ignore the record sets, required false, default value is false
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* session id: `6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051`
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/selectedgraph" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "dbvendor=dbvoracle" -F "ignoreFunction=true" -F "ignoreRecordSet=true" -F "isReturnModel=false" -F "sessionId=6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051" -F "table=user"
```
* response:
```json
{
"code": 200,
"data": {
"mode": "global",
"summary": {
...
},
"graph": {
"elements": {
"tables": [
...
],
"edges": [
...
]
},
"tooltip": {},
"relationIdMap": {
...
},
"listIdMap": {
...
}
}
},
"sessionId": "6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
}
```
* **/sqlflow/generation/sqlflow/getSelectedDbObjectInfo**
* Description: get the selected dbobject information, such as file information, sql index, dbobject positions, sql which contains selected dbobject.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **sessionId**: request sessionId, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* **coordinates**: the select dbobject positions, it's a json array string, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* session id: `6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051`
* coordinates: `[{'x':1,'y':8,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'},{'x':1,'y':12,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'}]`
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/getSelectedDbObjectInfo" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "coordinates=[{'x':1,'y':8,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'},{'x':1,'y':12,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'}]" -F "sessionId=6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
```
* response:
```json
{
"code": 200,
"data": [
{
"index": 0,
"positions": [
{
"x": 1,
"y": 8
},
{
"x": 1,
"y": 12
}
],
"sql": "select name from user"
}
]
}
```
### Sqlflow User Job Interface
* **/sqlflow/job/submitUserJob**
* Description: submit user job for multiple sql files, support zip file.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobName**: job name, required **true**
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* **sqlfiles**: request sql files, please use **multiple parts** to submit the sql files, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* curl command: **Note: please add **@** before the sql file path**
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/submitUserJob" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "sqlfiles=@D:/sql.txt" -F "dbvendor=dbvoracle" -F "jobName=job_test"
```
* response:
```json
{
"code": 200,
"data": {
"jobId": "6218721f092540c5a771ca8f82986be7",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"defaultDatabase": "",
"defaultSchema": "",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:11:28",
"status": "create"
}
}
```
* **/sqlflow/job/displayUserJobsSummary**
* Description: get the user jobs summary information.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/displayUserJobsSummary" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE"
```
* response:
```json
{
"code": 200,
"data": {
"total": 1,
"success": 1,
"partialSuccess": 0,
"fail": 0,
"jobIds": [
"bb996c1ee5b741c5b4ff6c2c66c371dd"
],
"jobDetails": [
{
"jobId": "bb996c1ee5b741c5b4ff6c2c66c371dd",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:16:11",
"status": "success",
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
]
}
}
```
* **/sqlflow/job/displayUserJobSummary**
* Description: get the specify user job information.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user jobs summary detail, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/displayUserJobSummary" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd"
```
* response:
```json
{
"code": 200,
"data": {
"total": 1,
"success": 1,
"partialSuccess": 0,
"fail": 0,
"jobIds": [
"bb996c1ee5b741c5b4ff6c2c66c371dd"
],
"jobDetails": [
{
"jobId": "bb996c1ee5b741c5b4ff6c2c66c371dd",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:16:11",
"status": "success",
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
]
}
}
```
* **/sqlflow/job/deleteUserJob**
* Description: delete the user job by job id.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user job detail, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/deleteUserJob" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd"
```
* response:
```json
{
"code": 200,
"data": {
"jobId": "bb996c1ee5b741c5b4ff6c2c66c371dd",
"jobName": "job_test",
"userId": "user_test",
"dbVendor": "dbvoracle",
"fileNames": [
"sql.txt"
],
"createTime": "2020-09-08 10:16:11",
"status": "delete",
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
}
```
* **/sqlflow/job/displayUserJobGraph**
* Description: get the sqlflow job's model and graph
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user jobs summary detail, required **true**
* database: selected database, required false
* schema: selected schema, required false
* table: selected table, required false
* isReturnModel: whether return the sqlflow model, required false, default value is true
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether output relation simply, required false, default value is false
* ignoreRecordSet: whether ignore the record sets, required false, default value is false
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/displayUserJobGraph?showRelationType=fdd&showRelationType=" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd" -F "ignoreFunction=true" -F "ignoreRecordSet=true" -F "isReturnModel=false" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd" -F "table=user"
```
* response:
```json
{
"code": 200,
"data": {
"mode": "global",
"summary": {
...
},
"graph": {
"elements": {
"tables": [
...
],
"edges": [
...
],
},
"tooltip": {},
"relationIdMap": {
...
},
"listIdMap": {
...
}
}
},
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
```
* **/sqlflow/job/updateUserJobGraphCache**
* Description: update the user job graph cache, then user can call **/sqlflow/generation/sqlflow/selectedgraph** by sessionId, the sessionId value is from job detail.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **jobId**: job id, the value is from user job detail, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql file: D:\sql.txt
* job id: bb996c1ee5b741c5b4ff6c2c66c371dd
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/job/updateUserJobGraphCache" -H "Request-Origion:SwaggerBootstrapUi" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "jobId=bb996c1ee5b741c5b4ff6c2c66c371dd"
```
* response:
```json
{
"code": 200,
"data": {
"sessionId": "a9f751281f8ef6936c554432e169359190d392565208931f201523e08036109d_1599531372233"
}
}
```
## Swagger
More information, please check the test environment swagger document:
* http://111.229.12.71:8081/gspLive_backend/doc.html?lang=en
* Token: `eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYwMzc1NjgwMCwiaWF0IjoxNTcyMjIwODAwfQ.EhlnJO7oqAHdr0_bunhtrN-TgaGbARKvTh2URTxu9iU`

View File

@ -0,0 +1,51 @@
## How to use Rest API of SQLFlow
This article describes how to use the Rest API provided by the SQLFlow to
communicate with the SQLFlow server and get the generated metadata and data lineage.
In this article, we use `Curl` to demonstrate the usage of the Rest API,
you can use any preferred programming language as you like.
### Prerequisites
To use the Rest API of the SQLFlow, you need to <a href="https://gudusoft.com">obtain a premium account</a>.
After that, you will get the `userid` and `secret key`, which will be used in the API.
- User ID
- Secrete Key
### Call Rest API
#### 1. Generate a token
Once you have the `userid` and `secret key`, the first API need to call is:
```
/gspLive_backend/user/generateToken
```
This API will return a temporary token that needs to be used in the API call thereafter.
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/user/generateToken" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:application/x-www-form-urlencoded;charset=UTF-8" -d "secretKey=YOUR SECRET KEY" -d "userId=YOUR USER ID HERE"
```
#### 2. Generate the data lineage
Call this API by sending the SQL query and get the result includes the data lineage.
```
/gspLive_backend/sqlflow/generation/sqlflow
```
Example in `Curl`
```
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/generation/sqlflow?showRelationType=fdd" -H "Request-Origion:testClientDemo" -H "accept:application/json;charset=utf-8" -H "Content-Type:multipart/form-data" -F "sqlfile=" -F "dbvendor=dbvoracle" -F "ignoreRecordSet=false" -F "simpleOutput=false" -F "sqltext=CREATE VIEW vsal as select * from emp" -F "userId=YOUR USER ID HERE" -F "token=YOUR TOKEN HERE"
```
#### 3. Other features
You can also use the rest API to submit a zip file that includes many SQL files or generate a map of the columns in the join condition.
### The full reference to the Rest APIs
[SQLFlow rest API reference](sqlflow_api.md)

View File

@ -0,0 +1,263 @@
# SQLFlow Web UI Control
![SQLFlow Control](../images/sqlflow_web_ui_control.png)
SQLFlow Web UI has some choice to control the result:
1. hide all columns
* just affect ui, table column ui height is 0.
2. dataflow
* show fdd relation.
3. impact
* show fdr, fdr relations.
4. show intermediate recordset
* display or hide intermediate recordset
5. show function
* display or hide function
## Web API Call
We use the restful api **/sqlflow/generation/sqlflow/graph** to get the sqlflow graph, it has several arguments:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* sqltext: sql text, required false
* sqlfile: sql file, required false
* **dbvendor**: database vendor, required **true**, available values:
* dbvbigquery, dbvcouchbase,dbvdb2,dbvgreenplum,dbvhana,dbvhive,dbvimpala,dbvinformix,dbvmdx,dbvmysql,dbvnetezza,dbvopenedge,dbvoracle,dbvpostgresql,dbvredshift,dbvsnowflake,dbvmssql,dbvsybase,dbvteradata,dbvvertica
* showRelationType: show relation type, required false, default value is **fdd**, multiple values seperated by comma like fdd,frd,fdr. Available values:
* **fdd**: value of target column from source column
* **frd**: the recordset count of target column which is affect by value of source column
* **fdr**: value of target column which is affected by the recordset count of source column
* **join**: combine rows from two or more tables, based on a related column between them
* simpleOutput: whether output relation simply, required false, default value is false
* ignoreRecordSet: whether ignore the record sets, required false, default value is false
* showLinkOnly: whether show relation linked columns only, required false, default value is true
* hideColumn: whether hide the column ui, required false, default value is false
* ignoreFunction: whether ignore the function relations, required false, default value is false
## How to Control The Sqlflow Web UI
1. hide all columns
* it matches the `hideColumn` argument. If the argument is `true`, `hideColumn` will be checked.
2. dataflow
* it matches the `showRelationType` argument. If the argument contains `fdd`, `dataflow` will be checked.
3. impact
* it matches the `showRelationType` argument. If the argument contains `fdr,fdd`, `impact` will be checked.
4. show intermediate recordset
* it matches the `ignoreRecordSet` argument. If the argument is `true`, `show intermediate recordset` will be checked.
5. show function
* it matches the `ignoreFunction` argument. If the argument is `true`, `show function` will be checked.
![SQLFlow Join](../images/sqlflow_web_ui_join.png)
1. Visualize join
* show join relations.
* it matches the `showRelationType` argument. If the argument is `join`, `Visualize join` will be displayed.
![SQLFlow Error Message](../images/sqlflow_error_message.png)
If sqlflow has some errors, it will be shown in the sqlflow json.
Sqlflow error message has 4 types:
* SYNTAX_ERROR
* gsp parsing sql returns some error messages.
* SYNTAX_HINT
* gsp parsing sql returns some hint messages.
* ANALYZE_ERROR
* dataflow analyzer occurs error.
* LINK_ORPHAN_COLUMN
* dataflow analyzer returns linking orphan column hint.
## Get the Error Message Position
Typically, if the datafow returns error messages, the lineage xml will show:
```xml
<dlineage>
...
<error errorMessage="find orphan column(10500) near: quantity(4,22)" errorType="SyntaxHint" coordinate="[4,22,0],[4,30,0]" originCoordinate="[4,22],[4,30]"/>
</dlineage>
```
Noting `coordinate="[4,22,0],[4,30,0]"`, we can use it to get the error position. [4,22,0] is the start position and [4,30,0] is the end position, 0 is the index of SQLInfo hashcode.
## How to Use WebAPI to Point the Position
* **/sqlflow/generation/sqlflow/getSelectedDbObjectInfo**
* Description: get the selected dbobject information, such as file information, sql index, dbobject positions, sql which contains selected dbobject.
* HTTP Method: **POST**
* Parameters:
* **userId**: the user id of sqlflow web or client, required **true**
* **token**: the token of sqlflow client request. sqlflow web, required false, sqlflow client, required true
* **sessionId**: request sessionId, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* **coordinates**: the select dbobject positions, it's a json array string, the value is from api **/sqlflow/generation/sqlflow/graph**, required **true**
* Return code:
* 200: successful
* other: failed, check the error field to get error message.
* Sample:
* test sql:
```sql
select name from user
```
* session id: `6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051`
* coordinates: `[{'x':1,'y':8,'hashCode':'0'},{'x':1,'y':12,'hashCode':'0'}]`
* curl command:
```bash
curl -X POST "http://127.0.0.1:8081/gspLive_backend/sqlflow/generation/sqlflow/getSelectedDbObjectInfo" -H "accept:application/json;charset=utf-8" -F "userId=google-oauth2|104002923119102769706" -F "token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJndWR1c29mdCIsImV4cCI6MTYxMDEyMTYwMCwiaWF0IjoxNTc4NTg1NjAwfQ.9AAIkjZ3NF7Pns-hRjZQqRHprcsj1dPKHquo8zEp7jE" -F "coordinates=[{'x':1,'y':8,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'},{'x':1,'y':12,'hashCode':'3630d5472af5f149fe3fb2202c8a338d'}]" -F "sessionId=6172a4095280ccce97e996242d8b4084f46e2c954455e71339aeffccad5f0d57_1599501562051"
```
* response:
```json
{
"code": 200,
"data": [
{
"index": 0,
"positions": [
{
"x": 1,
"y": 8
},
{
"x": 1,
"y": 12
}
],
"sql": "select name from user"
}
]
}
```
## Get SQL Information By SQLFLow Coordinate
### SQLInfo
When the sqlflow analyzing sql has been finished, it recorded some sql information, we can use it to locate database object position.
```java
public class SqlInfo {
private String fileName;
private String sql;
private int originIndex;
private int index;
private String group;
private int originLineStart;
private int originLineEnd;
private int lineStart;
private int lineEnd;
private String hash;
}
```
Each sql file matches a SqlInfo object, and the map key is "hash" property.
Sqlflow provides a tool class **gudusoft.gsqlparser.dlineage.util.SqlInfoHelper**, which can transform dataflow coordinate to `DbObjectPosition`.
### SqlInfoHelper
1. First step, call api `SqlInfoHelper.getSqlInfoJson` to fetch the sqlinfo map from the DataFlowAnalyzer object, and persist it.
```java
public static String getSqlInfoJson(DataFlowAnalyzer analyzer);
```
2. Second step, initialize the SqlInfoHelper with the sqlinfo json string.
```java
//Constructor
public SqlInfoHelper(String sqlInfoJson);
```
3. Third step, transform sqlflow position string to `dataflow.model.json.Coordinate` array.
* If you use the `dataflow.model.json.DataFlow` model, you can get the Coordinate object directly, doesn't need any transform.
* If you use the `dataflow.model.xml.dataflow` model, you can call api `SqlInfoHelper.parseCoordinateString`
```java
public static Coordinate[][] parseCoordinateString(String coordinate);
```
* Method parseCoordinateString support both of xml output coordinate string and json output coordinate string, like these:
```
//xml output coordinate string
[56,36,0],[56,62,0]
//json output coordinate string
[{"x":31,"y":36,"hashCode":"0"},{"x":31,"y":38,"hashCode":"0"}]
```
4. Fourth step, get the DbObjectPosition by api `getSelectedDbObjectInfo`
```java
public DbObjectPosition getSelectedDbObjectInfo(Coordinate start, Coordinate end);
```
* Each position has two coordinates, start coordinate and end coordinate. If the result of DBObject.getCoordinates() has 10 items, it matches 5 positions.
* The position is based on the entire file, but not one statement.
* The sql field of DbObjectPosition return all sqls of the file.
5. If you just want to get the specific statement information, please call the api `getSelectedDbObjectStatementInfo`
```java
public DbObjectPosition getSelectedDbObjectStatementInfo(EDbVendor vendor, Coordinate start, Coordinate end);
```
* The position is based on the statement.
* Return the statement index of sqls, index **bases 0**.
* Return a statement, but not all sqls of the file.
### How to Use DbObjectPosition
```java
public class DbObjectPosition {
private String file;
private String sql;
private int index;
private List<Pair<Integer, Integer>> positions = new ArrayList<Pair<Integer, Integer>>();
}
```
* file field matches the sql file name.
* sql field matches the sql content.
* index:
* If the sql file is from `grabit`, it's a json file, and it has an json array named "query", the value of index field is the query item index.
* Other case, the value of index field is 0.
* positions, locations of database object, they are matched the sql field. Position x and y **base 1** but not 0.
### Example 1 (getSelectedDbObjectInfo)
```java
String sql = "Select\n a\nfrom\n b;";
DataFlowAnalyzer dataflow = new DataFlowAnalyzer(sql, EDbVendor.dbvmssql, false);
dataflow.generateDataFlow(new StringBuffer());
dataflow flow = dataflow.getDataFlow();
String coordinate = flow.getTables().get(0).getCoordinate();
Coordinate[][] coordinates = SqlInfoHelper.parseCoordinateString(coordinate);
SqlInfoHelper helper = new SqlInfoHelper(SqlInfoHelper.getSqlInfoJson(dataflow));
DbObjectPosition position = helper.getSelectedDbObjectInfo(coordinates[0][0], coordinates[0][1]);
System.out.println(position.getSql());
System.out.println("table " + flow.getTables().get(0).getName() + " position is " + Arrays.toString(position.getPositions().toArray()));
```
Return:
```java
Select
a
from
b;
table b position is [[4,2], [4,3]]
```
### Example 2 (getSelectedDbObjectStatementInfo)
```java
String sql = "Select\n a\nfrom\n b;\n Select c from d;";
DataFlowAnalyzer dataflow = new DataFlowAnalyzer(sql, EDbVendor.dbvmssql, false);
dataflow.generateDataFlow(new StringBuffer());
gudusoft.gsqlparser.dlineage.dataflow.model.xml.dataflow flow = dataflow.getDataFlow();
String coordinate = flow.getTables().get(1).getCoordinate();
Coordinate[][] coordinates = SqlInfoHelper.parseCoordinateString(coordinate);
SqlInfoHelper helper = new SqlInfoHelper(SqlInfoHelper.getSqlInfoJson(dataflow));
DbObjectPosition position = helper.getSelectedDbObjectStatementInfo(EDbVendor.dbvmssql, coordinates[0][0], coordinates[0][1]);
System.out.println(position.getSql());
System.out.println(
"table " + flow.getTables().get(1).getName() + " position is " + Arrays.toString(position.getPositions().toArray()));
System.out.println(
"stmt index is " + position.getIndex());
```
Return:
```java
Select c from d;
table d position is [[1,20], [1,21]]
stmt index is 1
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.0 KiB

View File

@ -0,0 +1,58 @@
## Element in the data lineage xml output generated by the SQLFlow
### Table
`Table` is one of the major elements in the output of the data lineage.
The `type` of a `table` element can be the value of `table`, `pseudoTable`
#### 1. type = "table"
This means a base table found in the SQL query.
```sql
create view v123 as select a,b from employee a, name b where employee.id = name.id
```
```xml
<table id="2" name="employee" alias="a" type="table">
```
#### 2. type = "pseudoTable"
Due to the lack of metadata information, some columns can't be linked to a table correctly.
Those columns will be assigned to a pseudo table with name: `pseudo_table_include_orphan_column`.
The type of this table is `pseudoTable`.
In the following sample sql, columm `a`, `b` can't be linked to a specific table without enough information,
so a pseudo table with name `pseudo_table_include_orphan_column` is created to contain those orphan columns.
```sql
create view v123 as select a,b from employee a, name b where employee.id = name.id
```
```xml
<table id="11" name="pseudo_table_include_orphan_column" type="pseudoTable" coordinate="[1,1,f904f8312239df09d5e008bb9d69b466],[1,35,f904f8312239df09d5e008bb9d69b466]">
<column id="12" name="a" coordinate="[1,28,f904f8312239df09d5e008bb9d69b466],[1,29,f904f8312239df09d5e008bb9d69b466]"/>
<column id="14" name="b" coordinate="[1,30,f904f8312239df09d5e008bb9d69b466],[1,31,f904f8312239df09d5e008bb9d69b466]"/>
</table>
```
#### tableType
In the most case of SQL query, the table used is a base table.
However, derived tables are also used in the from clause or other places.
The `tableType` property in the `table` element tells you what kind of the derived table this table is.
Take the following sql for example, `WarehouseReporting.dbo.fnListToTable` is a function that
used as a derived table. So, the value of `tableType` is `function`.
Currently(GSP 2.2.0.6), `function` is the only value of `tableType`. More value of `tableType` will be added in the later version
such as `JSON_TABLE` for JSON_TABLE.
```sql
select entry as Account FROM WarehouseReporting.dbo.fnListToTable(@AccountList)
```
```xml
<table id="2" database="WarehouseReporting" schema="dbo" name="WarehouseReporting.dbo.fnListToTable" type="table" tableType="function" coordinate="[1,30,15c3ec5e6df0919bb570c4d8cdd66651],[1,87,15c3ec5e6df0919bb570c4d8cdd66651]">
<column id="3" name="entry" coordinate="[1,8,15c3ec5e6df0919bb570c4d8cdd66651],[1,13,15c3ec5e6df0919bb570c4d8cdd66651]"/>
</table>
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

View File

@ -0,0 +1,70 @@
## Automated data lineage from Azure (Command Line Mode)
This article introduces how to discover the data lineage from azure scripts or the azure database and automatically update it.
So the business users and developers can see the azure data lineage graph instantly.
### Software used in this solution
- [SQLFlow Cloud](https://sqlflow.gudusoft.com) Or [SQLFlow on-premise version](https://www.gudusoft.com/sqlflow-on-premise-version/)
- [Grabit tool](https://www.gudusoft.com/grabit/) for SQLFlow. It's free.
### Install grabit tool
After [download grabit tool](https://www.gudusoft.com/grabit/), please [check this article](https://github.com/sqlparser/sqlflow_public/tree/master/grabit)
to see how to setup the grabit tool.
### Discover data lineage in a Azure database
- Modify the `conf-template\azure-config-template` to meet your environment.
Here is a sample config file: `azure-config` that grabs metadata from the remote azure database
and sends the metadata to the SQLFlow Cloud to discover the data lineage.
It would help if you had [a premium account](https://github.com/sqlparser/sqlflow_public/blob/master/sqlflow-userid-secret.md) to access the SQLFlow Cloud.
```json
{
"databaseType":"azure",
"optionType":1,
"resultType":1,
"databaseServer":{
"hostname":"azure ip address",
"port":"1433",
"username":"azure user name",
"password":"your password here",
"database":"",
"extractedDbsSchemas":"",
"excludedDbsSchemas":"",
"extractedStoredProcedures":"",
"extractedViews":"",
"enableQueryHistory":false,
"queryHistoryBlockOfTimeInMinutes":30
},
"SQLFlowServer":{
"server":"https://api.gudusoft.com",
"serverPort":"",
"userId":"your sqlflow premium account id",
"userSecret":"your sqlflow premium account secret code"
},
"neo4jConnection":{
"url":"",
"username":"",
"password":""
},
"isUploadNeo4j":0
}
```
- Run grabit command-line tool, you may find the grabit.log under the logs directory.
```
./start.sh /f azure-config
```
- Check out the diagram via this url: [https://sqlflow.gudusoft.com/#/job/latest](https://sqlflow.gudusoft.com/#/job/latest)
- You may save the data lineage in JSON/CSV/GRAPHML format.
The file will be saved under `data\datalineage` directory.
- Run the grabit at a scheduled time
[Please check the instructions here](https://github.com/sqlparser/sqlflow_public/tree/master/grabit#run-the-grabit-at-a-scheduled-time)

View File

@ -0,0 +1,68 @@
## Automated data lineage from Azure (GUI Mode)
This article introduces how to discover the data lineage from azure scripts or the azure database and automatically update it.
So the business users and developers can see the Azure data lineage graph instantly.
### Software used in this solution
- [SQLFlow Cloud](https://sqlflow.gudusoft.com) Or [SQLFlow on-premise version](https://www.gudusoft.com/sqlflow-on-premise-version/)
- [Grabit tool](https://www.gudusoft.com/grabit/) for SQLFlow. It's free.
### Install grabit tool
After [download grabit tool](https://www.gudusoft.com/grabit/), please [check this article](https://github.com/sqlparser/sqlflow_public/tree/master/grabit)
to see how to setup the grabit tool.
### Discover data lineage in a Azure database
- After [start up the grabit tool](https://github.com/sqlparser/sqlflow_public/tree/master/grabit#running-the-grabit-tool), this is the first UI.
Click the `database` button.
![Grabit azure UI 1](grabit-azure-1.png)
- Select `azure` in the list
![Grabit azure UI 2 database](grabit-azure-2-database.png)
- Set the database parameters. In this example, we only discover the data lineage in DEMO_DB/PUBLIC schema.
![Grabit snowfalke UI 3 database parameters](grabit-azure-3-database-parameters.png)
- note
1.The `Database` parameter is must specified.
2.When the `ExtractedDBSSchemas` and `ExcludedDBSSchemas` parameters are null, all data for the currently connected database is retrieved by default.
3.If you just want to get all the data in the specified database, you can use the following configuration to achieve this: `ExtractedDBSSchemas: db/*`.
- After grabbing the metadata from the azure database, connect to the SQLFlow server.
It would help if you had [a premium account](https://github.com/sqlparser/sqlflow_public/blob/master/sqlflow-userid-secret.md) to access the SQLFlow Cloud.
![Grabit azure SQLFlow](grabit-azure-4-sqlflow.png)
- Submit the database metadata to the SQLFlow server and get the data lineage
![Grabit azure SQLFlow result](grabit-azure-5-sqlflow-result.png)
- Check out the diagram via this url: [https://sqlflow.gudusoft.com/#/job/latest](https://sqlflow.gudusoft.com/#/job/latest)
![Grabit azure data lineage result](grabit-azure-6-data-lineage-result.png)
- You may save the data lineage in JSON/CSV/GRAPHML format
The file will be saved under `data\datalineage` directory.
### Further information
This tutorial illustrates how to discover the data lineage of a Azure database in the grabit UI mode,
If you like to automated the data lineage discovery, you may use the Grabit command line mode.
- [Discover azure data lineage in command line mode](grabit-azure-command-line.md)
This tutorial illustrates how to discover the data lineage of a azure database by submitting the database
metadata to the SQLFlow Cloud version, You may set up the [SQLFlow on-premise version](https://www.gudusoft.com/sqlflow-on-premise-version/)
on your server to secure your information.
For more options of the grabit tool, please check this page.
- [Grabit tool readme](https://github.com/sqlparser/sqlflow_public/tree/master/grabit)
The completed guide of SQLFlow UI
- [How to use SQLFlow](https://github.com/sqlparser/sqlflow_public/blob/master/sqlflow_guide.md)

View File

@ -0,0 +1,244 @@
## Grabit Databse Connection Information Document
Specify a database instance that grabit will connect to fetch the metadata that helps SQLFlow make a more precise analysis and get a more accurate result of data lineage.
#### Databse Connection Information UI
![Databse Connection Information UI](connection.jpg)
#### Parameter Specification Of Connection Information
#### hostname
The IP of the database server that the grabit connects.
#### port
The port number of the database server that the grabit connect.
#### username
The database user used to login to the database.
#### password
The password of the database user.
note: the passwords can be encrypted using tools [Encrypted password](#Encrypted password), using encrypted passwords more secure.
#### privateKeyFile
Use a private key to connect, Only supports the `snowflake`.
#### privateKeyFilePwd
Generate the password for the private key, Only supports the `snowflake`.
#### database
The name of the database instance to which it is connected.
For azure,greenplum,netezza,oracle,postgresql,redshift,teradata databases, it represents the database name and is required, For other databases, it is optional.
`
note:
If this parameter is specified and the database to which it is connected is Azure, Greenplum, PostgreSQL, or Redshift, then only metadata under that library is extracted.
`
#### extractedDbsSchemas
List of databases and schemas to extract, separated by
commas, which are to be provided in the format database/schema;
Or blank to extract all databases.
`database1/schema1,database2/schema2,database3` or `database1.schema1,database2.schema2,database3`
When parameter `database` is filled in, this parameter is considered a schema.
And support wildcard characters such as `database1/*`,`*/schema`,`*/*`.
When the connected databases are `Oracle` and `Teradata`, this parameter is set the schemas, for example:
````json
extractedDbsSchemas: "HR,SH"
````
When the connected databases are `Mysql` , `Sqlserver`, `Postgresql`, `Snowflake`, `Greenplum`, `Redshift`, `Netezza`, `Azure`, this parameter is set database/schema, for example:
````json
extractedDbsSchemas: "MY/ADMIN"
````
#### excludedDbsSchemas
This parameters works under the resultset filtered by `extractedDbsSchemas`.
List of databases and schemas to exclude from extraction, separated by commas
`database1/schema1,database2` or `database1.schema1,database2`
When parameter `database` is filled in, this parameter is considered a schema.
And support wildcard characters such as `database1/*`,`*/schema`,`*/*`.
When the connected databases are `Oracle` and `Teradata`, this parameter is set the schemas, for example:
````json
excludedDbsSchemas: "HR"
````
When the connected databases are `Mysql` , `Sqlserver`, `Postgresql`, `Snowflake`, `Greenplum`, `Redshift`, `Netezza`, `Azure`, this parameter is set database/schema, for example:
````json
excludedDbsSchemas: "MY/*"
````
#### extractedStoredProcedures
A list of stored procedures under the specified database and schema to extract, separated by
commas, which are to be provided in the format database.schema.procedureName or schema.procedureName;
Or blank to extract all databases, support expression.
`database1.schema1.procedureName1,database2.schema2.procedureName2,database3.schema3,database4` or `database1/schema1/procedureName1,database2/schema2`
for example:
````json
extractedStoredProcedures: "database.scott.vEmp*"
````
or
````json
extractedStoredProcedures: "database.scott"
````
#### extractedViews
A list of stored views under the specified database and schema to extract, separated by
commas, which are to be provided in the format database.schema.viewName or schema.viewName.
Or blank to extract all databases, support expression.
`database1.schema1.procedureName1,database2.schema2.procedureName2,database3.schema3,database4` or `database1/schema1/procedureName1,database2/schema2`
for example:
````json
extractedViews: "database.scott.vEmp*"
````
or
````json
extractedViews: "database.scott"
````
#### enableQueryHistory
Fetch SQL queries from the query history if set to `true` default is false.
#### queryHistoryBlockOfTimeInMinutes
When `enableQueryHistory:true`, the interval at which the SQL query was extracted in the query History,default is `30` minutes.
#### queryHistorySqlType
When `enableQueryHistory:true`, the DML type of SQL is extracted from the query History.
When empty, all types are extracted, and when multiple types are specified, a comma separates them, such as `SELECT,UPDATE,MERGE`.
Currently only the snowflake database supports this parameter,support types are **SHOW,SELECT,INSERT,UPDATE,DELETE,MERGE,CREATE TABLE, CREATE VIEW, CREATE PROCEDURE, CREATE FUNCTION**.
for example:
````json
queryHistorySqlType: "SELECT,DELETE"
````
#### snowflakeDefaultRole
This value represents the role of the snowflake database.
````
note: You must define a role that has access to the SNOWFLAKE database,And assign WAREHOUSE permission to this role.
````
Assign permissions to a role, for example:
````sql
#create role
use role accountadmin;
grant imported privileges on database snowflake to role sysadmin;
grant imported privileges on database snowflake to role customrole1;
use role customrole1;
select * from snowflake.account_usage.databases;
#To do this, the Role gives the WAREHOUSE permission
select current_warehouse()
use role sysadmin
GRANT ALL PRIVILEGES ON WAREHOUSE %current_warehouse% TO ROLE customrole1;
````
#### metaStore
If the current data source is a `Hive` or `Spark` data store, this parameter can be set to `hive` or `sparksql`. By default, this parameter is left blank.
Sample configuration of a SQL Server database:
```json
"hostname":"127.0.0.1",
"port":"1433",
"username":"sa",
"password":"PASSWORD",
"database":"",
"extractedDbsSchemas":"AdventureWorksDW2019/dbo",
"excludedDbsSchemas":"",
"extractedStoredProcedures":"AdventureWorksDW2019.dbo.f_qry*",
"extractedViews":"",
"enableQueryHistory":false,
"queryHistoryBlockOfTimeInMinutes":30,
"snowflakeDefaultRole":"",
"queryHistorySqlType":"",
"metaStore":"hive"
```
#### sqlsourceTableName
table name: **query_table**
| query_name | query_source |
| ---------- | ----------------------------------- |
| query1 | create view v1 as select f1 from t1 |
| query2 | create view v2 as select f2 from t2 |
| query3 | create view v3 as select f3 from t3 |
If you save SQL queries in a specific table, one SQL query per row.
Let's say: `query_table.query_source` store the source code of the query.
We can use this query to fetch all SQL queries in this table:
```sql
select query_name as queryName, query_source as querySource from query_table
```
By setting the value of `sqlsourceTableName` and `sqlsourceColumnQuerySource`,`sqlsourceColumnQueryName`
grabit can fetch all SQL queries in this table and send it to the SQLFlow to analzye the lineage.
In this example,
```
"sqlsourceTableName":"query_table"
"sqlsourceColumnQuerySource":"query_source"
"sqlsourceColumnQueryName":"query_name"
```
Please leave `sqlsource_table_name` empty if you don't fetch SQL queries from a specific table.
#### sqlsourceColumnQuerySource
In the above sample:
```
"sqlsourceColumnQuerySource":"query_source"
```
#### sqlsourceColumnQueryName
```
"sqlsourceColumnQueryName":"query_name"
```
This parameter is optional, you don't need to speicify a query name column if it doesn't exist in the table.
- **fetch from query history**
Fetch SQL queries from the query history if set to `yes` default is no, SQL statement that can retrieve history execution from the database to which it is connected. You can specify the time for history execution. The default is 30 minutes.
`
note: Currently only supported Snowflake,Sqlserver
`

BIN
databases/connection.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 227 KiB

View File

@ -0,0 +1 @@
## Greenplum

View File

@ -0,0 +1,29 @@
### Discover data lineage from Hive alter table set location
```sql
ALTER TABLE a.b SET LOCATION 's3://xxx/xx/1/xxx/';
```
#### output lineage in diagram
![Hive alter table set location data lineage](alter_table_set_location_data_lineage.png)
#### output lineage in xml
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dlineage>
<path id="8" name="'s3://xxx/xx/1/xxx/'" uri="'s3://xxx/xx/1/xxx/'" type="path" coordinate="[1,30,0],[1,50,0]">
<column id="9" name="*" coordinate="[-1,-1,0],[-1,-1,0]"/>
</path>
<process id="6" name="Query Set Table Location-1" procedureName="batchQueries" queryHashId="05e88d7c9059de6a9fcbf0b185930152" type="sstaltertable" coordinate="[1,1,0],[1,51,0]"/>
<table id="4" database="a" name="a.b" type="table" processIds="6" coordinate="[1,13,0],[1,16,0]">
<column id="5" name="*" coordinate="[1,1,0],[1,2,0]"/>
</table>
<relationship id="1" type="fdd" processId="6" processType="sstaltertable">
<target id="5" column="*" parent_id="4" parent_name="a.b" coordinate="[1,1,0],[1,2,0]"/>
<source id="9" column="*" parent_id="8" parent_name="'s3://xxx/xx/1/xxx/'" coordinate="[-1,-1,0],[-1,-1,0]"/>
</relationship>
</dlineage>
```
This data lineage in xml is generated by [Gudu SQLFlow Java tool](https://www.gudusoft.com/sqlflow-java-library-2/)

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

62
databases/hive/readme.md Normal file
View File

@ -0,0 +1,62 @@
## Hive data lineage examples
- [Alter table set location](alter_table_set_location.md)
## connect to hive metastore
Use grabit command line to connect to a MySQL database that save the
Hive metastore. Fetch the metadata from the Hive metastore and send
to the SQLFlow to analyze the data lineage.
### config file
```json
{
"databaseServer":{
"hostname":"",
"port":"3306",
"username":"",
"password":"",
"database":"",
"extractedDbsSchemas":"",
"excludedDbsSchemas":"",
"extractedStoredProcedures":"",
"extractedViews":"",
"metaStore":"hive"
},
"SQLFlowServer":{
"server":"http://127.0.0.1",
"serverPort":"8081",
"userId":"gudu|0123456789",
"userSecret":""
},
"SQLScriptSource":"database",
"lineageReturnFormat":"json",
"databaseType":"mysql"
}
```
Please make sure to setup the `database` to the name of the MySQL database
which store the Hive metastore.
The IP below should be the machine where the SQLFlow on-premise version is installed.
```
"server":"http://127.0.0.1",
```
### command line syntax
- **mac & linux**
```shell script
chmod +x start.sh
sh start.sh /f config.json
```
- **windows**
```bat
start.bat /f config.json
```
## download the latest version grabit tool
https://www.gudusoft.com/grabit/

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 153 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 229 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Some files were not shown because too many files have changed in this diff Show More