3.9 KiB
Data lineage analysis command line tool
Dlineage analysis command line tool is used to analyze the data lineage in column level of a SQL query and output the data lineage in the XML/JSON format.
By default, the tool will output the data lineage in the XML format and includes the most detailed information except the following elements which will be ignored:
- the constant value (can be turned on using
/showConstantoption) - transform: the transform code (can be turned on using
/transformoption) - join: the join relation (can be turned on using
/joption)
Table level data lineage
This data lineage tool generate the data lineage at column level by default.
If you want to generate the data lineage at table level, you can use /tableLineage option.
Control the output of specific relation types
In the data lineage analysis, there are several types of relations, such as:
- fdd: the direct data flow
- fdr: the indirect data flow
- join: the join relation
- call: the call relation
- er: the entity-relationship relation
You may want to control which relation types are included in the output. The /includeRelationTypes option allows you to specify which relation types should be included. The syntax is:
/includeRelationTypes fdd,fdr,join,call,er
Please note that relation type not specified in the /includeRelationTypes option will not be output.
Parameters
/showResultSetTypes
There are many types of the intermediate result sets will be generated during the analysis of the data lineage in SQL query.
When we ignore all the intermediate result sets (using /s option), the data lineage generated only includes the base tables and views.
However, sometimes, we may want to see the data lineage of the intermediate result sets, for example, we want to see the data lineage of the result set generated by the aggregate function, or the result set generated by the insert/update clause in merge statement.
In this case, we can use /showResultSetTypes option to specify the types of the result sets to be output.
The result set types can be specified in the following format:
/showResultSetTypes array, struct, result_of, cte, insert_select, update_select, merge_update, merge_insert, output, update_set, pivot_table, unpivot_table, alias, rs, function, case_when
This option is valid only when /s option is used.
/withTemporaryTable
Temporary table such as #temp_table in SQL Server will not be output in the simple output by default.
If you want to output the temporary table, you can use /withTemporaryTable option.
/s /withTemporaryTable
/j
Option to output the join relation.
-- Oracle sample query
SELECT e.employee_id, e.first_name, e.last_name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY e.employee_id;
call relation
Option to output the call relation.
-- SQL Server sample query
CREATE PROCEDURE [dbo].[usp_insert]
( @a varchar(50), @b varchar(15), @c varchar(6), @d varchar(50) )
AS
BEGIN
if ((select count(*) from tbl_Log1) <50000)
exec [dbo].[usp_insert_into_Log1] @a,@b,@c,@d
else
exec [dbo].[usp_insert_into_Log2] @a,@b,@c,@d
END
generated lineage with call relation:
<relationship id="3" type="call">
<caller id="1" name="[dbo].[usp_insert]" type="procedure"/>
<callee id="29" name="[dbo].[usp_insert_into_Log1]" type="procedure"/>
</relationship>
<relationship id="4" type="call">
<caller id="1" name="[dbo].[usp_insert]" type="procedure"/>
<callee id="30" name="[dbo].[usp_insert_into_Log2]" type="procedure"/>
</relationship>