Azure Data Factory – Aggregate transformation in mapping data flow

Mapping data flow comes with many transformation options. While working with data flows, you need to incorporate appropriate transformations to get the desired result. The Aggregate transformation helps to perform aggregations of data using Count, Min, Max, and Sum with expression builder in ADF.

So let’s begin with the implementation:-

Implementation

We are going to use below Azure services to implement Aggregate transformation:-

  • Azure Data Factory
  • SQL Server Linked Service
  • Azure SQL Server database

* Steps are not covered to create these services in Azure.

Data Creation: First, a database table is needed with some numeric data in Azure SQL Database. The Table name is “OrderDetails” and it has columns OrderId, ProductID, UnitPrice, Quantity, and TotalSales.

Scripts for the database table and sample data creation are below.

--Create Table
Create Table OrderDetails
(
	OrderId INT, 
	ProductID INT, 
	UnitPrice INT, 
	Quantity INT, 
	TotalSales INT
)
--Sample Data, This is not the complete data.
INSERT INTO OrderDetails Values(1,118,297,3,891)
INSERT INTO OrderDetails Values(2,358,265,5,1325)
INSERT INTO OrderDetails Values(3,172,114,5,570)
INSERT INTO OrderDetails Values(4,206,140,3,420)
INSERT INTO OrderDetails Values(5,446,230,1,230)
INSERT INTO OrderDetails Values(6,263,222,1,222)
INSERT INTO OrderDetails Values(7,188,150,4,600)
INSERT INTO OrderDetails Values(8,140,134,4,536)
INSERT INTO OrderDetails Values(9,208,102,2,204)
INSERT INTO OrderDetails Values(10,406,100,5,500)
INSERT INTO OrderDetails Values(11,349,111,5,555)
INSERT INTO OrderDetails Values(12,495,252,1,252)
INSERT INTO OrderDetails Values(13,213,227,2,454)
INSERT INTO OrderDetails Values(14,427,227,2,454)
INSERT INTO OrderDetails Values(15,336,112,1,112)

SQL Linked Service: Second, a SQL Linked Service requires which would be used by dataset to store the data from the SQL Server in Azure Data Factory. Do make sure that linked service should be able to connect with SQL Server database successfully. If you want to implement linked service Parametrization, you can follow the post. Let’s name the linked service “mappinglinkedservice“.

SQL Dataset: Third, the SQL Server dataset requires to store the data during Azure Data Factory execution. Let’s name it “orderdetailsdataset” and click OK. If needed, click on the “Preview data” to verify the dataset data.

1. SQL Dataset

Mapping Dataflow: Forth, mapping data flow needs which performs the aggregation of data. Let’s name it “aggregate-transformation“. The next step is to add a data source that is nothing but the dataset “orderdetailsdataset.

2. Aggregation Flow
3. Add Dataset

Aggregate Transformation: This is an inbuilt transformation component in Azure Data Factory. Click on the “+” next to the orderdetailsdataset.

This transformation can be used to get sum, average, sumif, averageif, mean, and meanif of numeric columns.

We can use ProductID for groping the data and perform sum and avg of TotalSales using Aggregate Transformation.

Apply the settings for Group by and Aggregates as shown below:-

4. Add Transformation
5. Group By Setting
6. Aggregate Columns Expression

The Data Preview is not enabled by default while working with Data Flows. We need to enable Data Flow Debug to see the data preview at transformation level which we can see here.

7. Data Preview

More options available to transform the previewed data using Typecast and Modify and can be used as needed.

Typecast provides string, short, byte, integer, long, float, double, decimal, and boolean options to cast the data.

Modify provides Round, Floor, Ceil, Abs, and Sqrt to get the desired numeric formatting.

We want to see the rounded values for the TotalSales Average column then we can leverage the Round option under the Modify menu. Updated data is is shown as below:-

8. Rounded Preview

So, we have seen Aggregate Transformation using one of the simplest ways in Azure Data Factory. This transformation practices to avoid data grouping at the SQL Server side or any other data source side. So, this is the final data using Aggregate Transformation.

If you have any suggestions/feedback, please put them in the comment box.

Happy Learning 🙂

Leave a Reply

Up ↑

%d bloggers like this: