
Mapping data flow is an inbuilt data transformation tool in Azure data factory. It provides data transformation and visualization interface where no or very limited coding is needed.
There are various transformation categories such as schema modifiers, row modifier, multiple inputs and outputs, formatters, flowlets and destination or sink. These categories are having number of transformation to pefrom data transformation. Let’s see available transformation in each category and when to use them.
Schema Modifiers
Name | When to use |
Select | To select needed column, add column alias, remove and reorder columns |
Aggregate | To derive COUNT, MIN, MAX and SUM for a numeric column |
Cast | To modify column data type with type checking |
Derived Column | To add new column or modify existing column using expression language |
Pivot | To transform output of an aggregation where distinct row value becomes column header and aggregate value becomes row |
Rank | To generate order ranking based on sort conditions defined |
External Call | To call external rest or third party APIs, get the results and use with other data sources |
Surrogate Key | To add incremental unique key value for each row data. |
Unpivot | To transform columns into rows values |
Row Modifiers
Name | When to use |
Alter row | To apply condition policy as an expression to Insert, Update, Delete and Upsert on rows |
Assert | To apply assert rules for each row to assess data quality and data validation |
Filter | To filter the rows based on a condition |
Sort | To sort the data stream upon a set columns in Ascending or Descending order |
Multiple Inputs and Outputs
Name | When to use |
Conditional Split | To split the input data stream into multiple data streams based upon matching conditions |
Exists | To check if data exists in data stream. This behaves as SQL EXISTS and NOT EXISTS |
Join | To combine two input streams based on join conditions on columns. Data flows supports Inner Join, Left Outer, Right Outer, Full Outer and Custom Cross Join |
Lookup | To reference data from input stream as per the match on conditions defined. |
New Branch | To perform multiple sets of operations and transformations on the same data stream. |
Union | To generate single data stream from multiple data streams |
Formatters
Name | When to use |
Flatten | To transform array values of hierarchical structures ex. JSON and unroll into individual rows |
Parse | Parse text columns that are strings of JSON, delimited text, or XML formatted text. |
Stringify | To transform complex objects or data types into string |
Flowlets
Name | When to use |
Flowlet | To create custom re-usable transformation logic to use in data flows |
Consulsion
So, here is the list available transformations in the mapping data flow. Apart form the list, there are two the most transformations those are Source and Sink. Source transformation is an input for data flow and Sink transformation is an output for data flow. These transformations are the building blocks for the supported data formats in Azure Data Factory and Azure Synapse.
If you have any suggestions/feedback, please put them in the comment box.
Happy Learning 🙂
Leave a Reply