sqlflow_public/dbobjects_relationship_v1.md

Relationships in the data lineage generated by the [SQLFlow](https://gudusoft.com/sqlflow/#/).

- [column relationship](#type-of-the-column-relationships)
- [table relationship](#type-of-the-table-relationships)

## Type of the column relationships

* **fdd**,  the value of target column is come from source column, such as: t = s + 2

	You may check `effectType` to see how the target column is changed.

	- `effectType = select`, the source data is from select.
	- `effectType = insert`, the source data is from insert.
	- `effectType = update`, the source data is from update.
	- `effectType = merge_update`, the source data is merge update.
	- `effectType = merege_insert`, the source data is from merege insert.
	- `effectType = create_view`, the source data is from create view.

* **fddi**, the value of the target column is not derived from the source column directly, but it is affected by the source column.
		However, it's difficult to determine this kind of relation, take this syntax for example: t=fx(s).
		so the relationship of the source and target column is `fdd` in a function call unless we know clearly that `t` will not
		include value from `s`.

	In this sample SQL, the relationship between `teur` and `kamut` is fddi.

	```sql
	select
		case when a.kamut=1 and b.teur IS null
				 then 'no locks'
			 when a.kamut=1
				then b.teur
		else 'locks'
		end teur
	from tbl a left join TT b on (a.key=b.key)
	```


* **fdr**, the value of target column affected by the row number of the source table. It always happens when the aggregate function is used.
	the source column may be appeared in the where，group by clause. This kind of relationship may also appear between the target column and the source table.


* **frd**, the row number of target column is affected by the value of source column. The source column usually appears in the where clause.


The meaning of the letter in `fdd`, `fddi`, `fdr`, `frd`, f: dataflow, d: data value, r: record set.

The first letter is always f，the second letter represents the source column，the third letter represents the target column, the fourth is reserved.


### fdd
The value of the target column is derived from the source column directly.
```sql
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
```

`vEmp.eName` <- fdd <- `"eName"` <- fdd <- `scott.emp.empName`

### fdr
The value of the target column is influenced by a source table itself, for example by the number of records.
This is caused by the use of aggregate function in the query.
```sql
create view vSal as
SELECT a.deptno "Department",
    a.num_emp/b.total_count "Employees",
    a.sal_sum/b.total_sal "Salary"
FROM
    (SELECT deptno, COUNT() num_emp, SUM(SAL) sal_sum
        FROM scott.emp
       Where city = 'NYC'
       GROUP BY deptno) a,
    (SELECT COUNT() total_count, SUM(sal) total_sal
        FROM scott.emp
       Where city = 'NYC') b
```

`vSal.Salary` depends on the record number of table: `scott.emp`.
This is due to the use of aggregate function `COUNT()`, `SUM()` in the query,
`vSal.Salary` also depends on the `scott.emp.deptno` column which is used in the
group by clause.

The `city` column in the where clause also determines the value of `vSal.Salary`.

The chain of the dataflow is:

`vSal.Salary` <- fdd <- `query_resultset.Salary`

`query_resultset.Salary` <- fdd <- `sal_sum`

`query_resultset.Salary` <- fdd <- `total_sal`

`sal_sum` <- fdd <- `scott.emp.SAL`

`sal_sum` <- fdr <- `scott.emp`

`sal_sum` <- fdr <- `scott.emp.city`

`sal_sum` <- fdr <- `scott.emp.deptno`

`total_sal` <- fdd <- `scott.emp.sal`

`total_sal` <- fdr <- `scott.emp`

`total_sal` <- fdr <- `scott.emp.city`


### frd
Some of the columns in source tables such as WHERE clause do not influence the value of target columns
but are crucial for the selected row set, so they are also saved for impact analyses,
with relationship to the target columns.
```sql
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
```
The value of `vEmp.eName` doesn’t depends on `scott.emp.sal`,
but the number of records in the `vEmp` depends on the `scott.emp.sal`,
so this tool record this kind of relationship as well

`vEmp.eName` <- fdd <- `query_resultset."eName"` <- fdd <- `scott.emp.empName`

`vEmp.eName` <- fdd <- `query_resultset."eName"` <- frd <- `scott.emp.sal`

### fddi
The value of the target column depends on the value of the source column, but not come from the source column.
```sql
select
	case when a.kamut=1 and b.teur IS null
			 then 'no locks'
		 when a.kamut=1
			then b.teur
	else 'locks'
	end teur
from tbl a left join TT b on (a.key=b.key)
```

The value of the select result: `teur` depends on the source column `tbl.kamut`
in the case expression, although it values is not derived from `tbl.kamut` directly.

`query_result.teur` <- fddi <- `tbl.kamut`

`query_result.teur` <- frd <- `tbl.key`

`query_result.teur` <- frd <- `TT.key`


## Type of the table relationships
During ETL, It's typically to work with the staging tables and later rename them to the prod table names, or even better, swap the prod and staging tables in a single operation.
So there is a need to understand this kind of transition.

```sql
create table t2 as select c from t1;
alter table t2 rename to t3;
create table t4 as select * from t3;

OR

create table t2 as select c from t1;
alter table t2 swap with t3;
create table t4 as select * from t3;
```

We also use `fdd` to reprenst those 2 relations.

```sql
alter table t2 rename to t3;
```
This relation should be built in the dataflow output.
`t2` -> fdd -> `t3`

```xml
<relation id="1" type="fdd" effectType="renameTable">
  <target id="5" target_name="t3" coordinate="x,y"/>
  <source source_id="1" source_name="t2" coordinate="x,y"/>
</relation>
```


```sql
alter table t2 swap with t3;
```
This relation should be built in the dataflow output.
`t2` -> fdd -> `t3`

```xml
<relation id="1" type="fdd" effectType="swapTable">
  <target id="5" target_name="t2" coordinate="x,y"/>
  <source source_id="1" source_name="t3" coordinate="x,y"/>
</relation>
```