Mapping Data Flow Transformations Azure Data Factory and Azure Synapse

Mapping data flow is an inbuilt data transformation tool in Azure data factory. It provides data transformation and visualization interface where no or very limited coding is needed.

There are various transformation categories such as schema modifiers, row modifier, multiple inputs and outputs, formatters, flowlets and destination or sink. These categories are having number of transformation to pefrom data transformation. Let’s see available transformation in each category and when to use them.

Schema Modifiers

NameWhen to use
SelectTo select needed column, add column alias, remove and reorder columns
AggregateTo derive COUNT, MIN, MAX and SUM for a numeric column
CastTo modify column data type with type checking
Derived ColumnTo add new column or modify existing column using expression language
PivotTo transform output of an aggregation where distinct row value becomes column header and aggregate value becomes row
RankTo generate order ranking based on sort conditions defined
External CallTo call external rest or third party APIs, get the results and use with other data sources
Surrogate KeyTo add incremental unique key value for each row data.
UnpivotTo transform columns into rows values

Row Modifiers

NameWhen to use
Alter rowTo apply condition policy as an expression to Insert, Update, Delete and Upsert on rows
AssertTo apply assert rules for each row to assess data quality and data validation
FilterTo filter the rows based on a condition
SortTo sort the data stream upon a set columns in Ascending or Descending order

Multiple Inputs and Outputs

NameWhen to use
Conditional SplitTo split the input data stream into multiple data streams based upon matching conditions
ExistsTo check if data exists in data stream. This behaves as SQL EXISTS and NOT EXISTS
JoinTo combine two input streams based on join conditions on columns. Data flows supports Inner Join, Left Outer, Right Outer, Full Outer and Custom Cross Join
LookupTo reference data from input stream as per the match on conditions defined.
New BranchTo perform multiple sets of operations and transformations on the same data stream.
UnionTo generate single data stream from multiple data streams

Formatters

NameWhen to use
FlattenTo transform array values of hierarchical structures ex. JSON and unroll into individual rows
ParseParse text columns that are strings of JSON, delimited text, or XML formatted text.
StringifyTo transform complex objects or data types into string

Flowlets

NameWhen to use
FlowletTo create custom re-usable transformation logic to use in data flows

Consulsion

So, here is the list available transformations in the mapping data flow. Apart form the list, there are two the most transformations those are Source and Sink. Source transformation is an input for data flow and Sink transformation is an output for data flow. These transformations are the building blocks for the supported data formats in Azure Data Factory and Azure Synapse.

If you have any suggestions/feedback, please put them in the comment box.

Happy Learning 🙂

Leave a Reply

Up ↑

%d bloggers like this: