6.0 KiB
Relationships in the data lineage generated by the SQLFlow.
Type of the column relationships
-
fdd, the value of target column is come from source column, such as: t = s + 2
You may check
effectTypeto see how the target column is changed.effectType = select, the source data is from select.effectType = insert, the source data is from insert.effectType = update, the source data is from update.effectType = merge_update, the source data is merge update.effectType = merege_insert, the source data is from merege insert.effectType = create_view, the source data is from create view.
-
fddi, the value of the target column is not derived from the source column directly, but it is affected by the source column. However, it's difficult to determine this kind of relation, take this syntax for example: t=fx(s). so the relationship of the source and target column is
fddin a function call unless we know clearly thattwill not include value froms.In this sample SQL, the relationship between
teurandkamutis fddi.select case when a.kamut=1 and b.teur IS null then 'no locks' when a.kamut=1 then b.teur else 'locks' end teur from tbl a left join TT b on (a.key=b.key) -
fdr, the value of target column affected by the row number of the source table. It always happens when the aggregate function is used. the source column may be appeared in the where,group by clause. This kind of relationship may also appear between the target column and the source table.
-
frd, the row number of target column is affected by the value of source column. The source column usually appears in the where clause.
The meaning of the letter in fdd, fddi, fdr, frd, f: dataflow, d: data value, r: record set.
The first letter is always f,the second letter represents the source column,the third letter represents the target column, the fourth is reserved.
fdd
The value of the target column is derived from the source column directly.
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
vEmp.eName <- fdd <- "eName" <- fdd <- scott.emp.empName
fdr
The value of the target column is influenced by a source table itself, for example by the number of records. This is caused by the use of aggregate function in the query.
create view vSal as
SELECT a.deptno "Department",
a.num_emp/b.total_count "Employees",
a.sal_sum/b.total_sal "Salary"
FROM
(SELECT deptno, COUNT() num_emp, SUM(SAL) sal_sum
FROM scott.emp
Where city = 'NYC'
GROUP BY deptno) a,
(SELECT COUNT() total_count, SUM(sal) total_sal
FROM scott.emp
Where city = 'NYC') b
vSal.Salary depends on the record number of table: scott.emp.
This is due to the use of aggregate function COUNT(), SUM() in the query,
vSal.Salary also depends on the scott.emp.deptno column which is used in the
group by clause.
The city column in the where clause also determines the value of vSal.Salary.
The chain of the dataflow is:
vSal.Salary <- fdd <- query_resultset.Salary
query_resultset.Salary <- fdd <- sal_sum
query_resultset.Salary <- fdd <- total_sal
sal_sum <- fdd <- scott.emp.SAL
sal_sum <- fdr <- scott.emp
sal_sum <- fdr <- scott.emp.city
sal_sum <- fdr <- scott.emp.deptno
total_sal <- fdd <- scott.emp.sal
total_sal <- fdr <- scott.emp
total_sal <- fdr <- scott.emp.city
frd
Some of the columns in source tables such as WHERE clause do not influence the value of target columns but are crucial for the selected row set, so they are also saved for impact analyses, with relationship to the target columns.
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
The value of vEmp.eName doesn’t depends on scott.emp.sal,
but the number of records in the vEmp depends on the scott.emp.sal,
so this tool record this kind of relationship as well
vEmp.eName <- fdd <- query_resultset."eName" <- fdd <- scott.emp.empName
vEmp.eName <- fdd <- query_resultset."eName" <- frd <- scott.emp.sal
fddi
The value of the target column depends on the value of the source column, but not come from the source column.
select
case when a.kamut=1 and b.teur IS null
then 'no locks'
when a.kamut=1
then b.teur
else 'locks'
end teur
from tbl a left join TT b on (a.key=b.key)
The value of the select result: teur depends on the source column tbl.kamut
in the case expression, although it values is not derived from tbl.kamut directly.
query_result.teur <- fddi <- tbl.kamut
query_result.teur <- frd <- tbl.key
query_result.teur <- frd <- TT.key
Type of the table relationships
During ETL, It's typically to work with the staging tables and later rename them to the prod table names, or even better, swap the prod and staging tables in a single operation. So there is a need to understand this kind of transition.
create table t2 as select c from t1;
alter table t2 rename to t3;
create table t4 as select * from t3;
OR
create table t2 as select c from t1;
alter table t2 swap with t3;
create table t4 as select * from t3;
We also use fdd to reprenst those 2 relations.
alter table t2 rename to t3;
This relation should be built in the dataflow output.
t2 -> fdd -> t3
<relation id="1" type="fdd" effectType="renameTable">
<target id="5" target_name="t3" coordinate="x,y"/>
<source source_id="1" source_name="t2" coordinate="x,y"/>
</relation>
alter table t2 swap with t3;
This relation should be built in the dataflow output.
t2 -> fdd -> t3
<relation id="1" type="fdd" effectType="swapTable">
<target id="5" target_name="t2" coordinate="x,y"/>
<source source_id="1" source_name="t3" coordinate="x,y"/>
</relation>