sqlflow_public/dbobjects_relationship_v1.md

197 lines
6.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Relationships in the data lineage generated by the [SQLFlow](https://gudusoft.com/sqlflow/#/).
- [column relationship](#type-of-the-column-relationships)
- [table relationship](#type-of-the-table-relationships)
## Type of the column relationships
* **fdd**, the value of target column is come from source column, such as: t = s + 2
You may check `effectType` to see how the target column is changed.
- `effectType = select`, the source data is from select.
- `effectType = insert`, the source data is from insert.
- `effectType = update`, the source data is from update.
- `effectType = merge_update`, the source data is merge update.
- `effectType = merege_insert`, the source data is from merege insert.
- `effectType = create_view`, the source data is from create view.
* **fddi**, the value of the target column is not derived from the source column directly, but it is affected by the source column.
However, it's difficult to determine this kind of relation, take this syntax for example: t=fx(s).
so the relationship of the source and target column is `fdd` in a function call unless we know clearly that `t` will not
include value from `s`.
In this sample SQL, the relationship between `teur` and `kamut` is fddi.
```sql
select
case when a.kamut=1 and b.teur IS null
then 'no locks'
when a.kamut=1
then b.teur
else 'locks'
end teur
from tbl a left join TT b on (a.key=b.key)
```
* **fdr**, the value of target column affected by the row number of the source table. It always happens when the aggregate function is used.
the source column may be appeared in the wheregroup by clause. This kind of relationship may also appear between the target column and the source table.
* **frd**, the row number of target column is affected by the value of source column. The source column usually appears in the where clause.
The meaning of the letter in `fdd`, `fddi`, `fdr`, `frd`, f: dataflow, d: data value, r: record set.
The first letter is always fthe second letter represents the source columnthe third letter represents the target column, the fourth is reserved.
### fdd
The value of the target column is derived from the source column directly.
```sql
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
```
`vEmp.eName` <- fdd <- `"eName"` <- fdd <- `scott.emp.empName`
### fdr
The value of the target column is influenced by a source table itself, for example by the number of records.
This is caused by the use of aggregate function in the query.
```sql
create view vSal as
SELECT a.deptno "Department",
a.num_emp/b.total_count "Employees",
a.sal_sum/b.total_sal "Salary"
FROM
(SELECT deptno, COUNT() num_emp, SUM(SAL) sal_sum
FROM scott.emp
Where city = 'NYC'
GROUP BY deptno) a,
(SELECT COUNT() total_count, SUM(sal) total_sal
FROM scott.emp
Where city = 'NYC') b
```
`vSal.Salary` depends on the record number of table: `scott.emp`.
This is due to the use of aggregate function `COUNT()`, `SUM()` in the query,
`vSal.Salary` also depends on the `scott.emp.deptno` column which is used in the
group by clause.
The `city` column in the where clause also determines the value of `vSal.Salary`.
The chain of the dataflow is:
`vSal.Salary` <- fdd <- `query_resultset.Salary`
`query_resultset.Salary` <- fdd <- `sal_sum`
`query_resultset.Salary` <- fdd <- `total_sal`
`sal_sum` <- fdd <- `scott.emp.SAL`
`sal_sum` <- fdr <- `scott.emp`
`sal_sum` <- fdr <- `scott.emp.city`
`sal_sum` <- fdr <- `scott.emp.deptno`
`total_sal` <- fdd <- `scott.emp.sal`
`total_sal` <- fdr <- `scott.emp`
`total_sal` <- fdr <- `scott.emp.city`
### frd
Some of the columns in source tables such as WHERE clause do not influence the value of target columns
but are crucial for the selected row set, so they are also saved for impact analyses,
with relationship to the target columns.
```sql
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
```
The value of `vEmp.eName` doesnt depends on `scott.emp.sal`,
but the number of records in the `vEmp` depends on the `scott.emp.sal`,
so this tool record this kind of relationship as well
`vEmp.eName` <- fdd <- `query_resultset."eName"` <- fdd <- `scott.emp.empName`
`vEmp.eName` <- fdd <- `query_resultset."eName"` <- frd <- `scott.emp.sal`
### fddi
The value of the target column depends on the value of the source column, but not come from the source column.
```sql
select
case when a.kamut=1 and b.teur IS null
then 'no locks'
when a.kamut=1
then b.teur
else 'locks'
end teur
from tbl a left join TT b on (a.key=b.key)
```
The value of the select result: `teur` depends on the source column `tbl.kamut`
in the case expression, although it values is not derived from `tbl.kamut` directly.
`query_result.teur` <- fddi <- `tbl.kamut`
`query_result.teur` <- frd <- `tbl.key`
`query_result.teur` <- frd <- `TT.key`
## Type of the table relationships
During ETL, It's typically to work with the staging tables and later rename them to the prod table names, or even better, swap the prod and staging tables in a single operation.
So there is a need to understand this kind of transition.
```sql
create table t2 as select c from t1;
alter table t2 rename to t3;
create table t4 as select * from t3;
OR
create table t2 as select c from t1;
alter table t2 swap with t3;
create table t4 as select * from t3;
```
We also use `fdd` to reprenst those 2 relations.
```sql
alter table t2 rename to t3;
```
This relation should be built in the dataflow output.
`t2` -> fdd -> `t3`
```xml
<relation id="1" type="fdd" effectType="renameTable">
<target id="5" target_name="t3" coordinate="x,y"/>
<source source_id="1" source_name="t2" coordinate="x,y"/>
</relation>
```
```sql
alter table t2 swap with t3;
```
This relation should be built in the dataflow output.
`t2` -> fdd -> `t3`
```xml
<relation id="1" type="fdd" effectType="swapTable">
<target id="5" target_name="t2" coordinate="x,y"/>
<source source_id="1" source_name="t3" coordinate="x,y"/>
</relation>
```