4.0 KiB
When analyzing data lineage, the complete model is always generated since all other higher level models are based on this model.
1. Types of entity
Table, view, column, process(SQL statement), resultset, function, variale, path.
2. Types of relation
fdd
The fdd relation means the data of the target entity comes from the source entity. Take this SQL query for example:
SELECT a.empName "eName"
FROM scott.emp a
the data of target column "eName" comes from scott.emp.empName, so we have a dataflow relation like this:
scott.emp.empName -> fdd -> "eName"
the result generated by the select list is called: resultset which ikes a virtual table includes columns and rows.
fdr
The fdr relation means the data of the source column will impact the row number of the resultset in the select list, or will impact the result value of an anggreate function.
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
The total number of rows in the select list is impacted by the value of column sal in the where clause. So we have a dataflow relation like this:
sal -> fdr -> resultset.PseudoRows
PseudoRows column
As you can see, we introduced a new pseudo column: PseudoRows to represents the number of rows in the resultset.
The fdr type dataflow is represented by a dash line. You can hide the fdr type dataflow by turn off the impact option in the SQLFlow.
You may find more examples about the fdr relation in the lineage in real SQL section.
join
The join relation build a link between 2 or more columns in the join condition. Striclty speaking, the relation is not a dataflow relation.
select b.teur
from tbl a left join TT b on (a.key=b.key)
A join relation will be created after analzye the above SQL. It indicates a join relation betwee tbl.key and TT.key.
3. Connect the entity using relation
When build relation between 2 entities, the source and target entity can be column to column, or, table to table.
column to column
This is the most often case that both entity in a relation are columns.
table to table
Sometimes, there will be a relation between 2 tables. For example, in an alter table rename SQL statement, a table to table relation will be created. Acutally, a table to table relation is represented by a column to column relation using the PseudoRows column.
Table to table relation is included in the complete lineage model by using PseudoRows for 2 reasons:
- This pseudo column to column relation will be used to generate a table-level lineage later if user need a table-level lineage model.
- If a column in this table is used in a column to column relation while the table itself is in a table to table relation, then, this pseudo column will make it possible for a single table to includes both the column to column relation and table to table relation.
take this SQL for example
create view v1 as select f1 from t2;
alter table t2 rename to t3;
column to column relation
As you can see, Table t2 involved in the column to column relation generated by the create view statement, It also involved in a table to table relation generated by the alter table statement. A single table t2 in the above digram show that it includes both the column to column relation and the table to table reation.
table-level lineage
With the table to table relation generated in the complete data lineage model, we can later use it to generate a table-level lineage like this:





