This article talks how to create an Azure Data Factory from Azure Portal and perform a copy activity between. You need to create below services to implement a pipeline to perform copy activity.
- Create a resource group
- Create SQL Server Instance with one Database (Source Data)
- Create Storage Account with one Blob container
- Create Data Factory
Let’s login to Azure Portal and create a resource group (demo-resource-group) and inside this resource group create SQL Server and Storage Account . Detailed steps to create these services are not covered because it is quite simple but you can create a Azure Data Factory as below
Your resource group should look like as below. Resources name can be different in your resource group
- Connect SQL database by entering username and password. You need to allow your client IP address in the Firewall settings to connect SQL Server.
- Click on Query Editor from left panel and this will open a query editor. Run SQL server commands to create table and some rows.
Create table Employees ( EmployeeID int, FirstName varchar(255), LastName varchar(255), City varchar(50), DateOfBirth date ); INSERT INTO Employees Values(1, 'James' , 'David' , 'New York' , '12/01/1991') INSERT INTO Employees Values(2, 'Mike' , 'Jane' , 'Singapore' , '06/03/1992') INSERT INTO Employees Values(3, 'Juan' , 'James' , 'Tokyo' , '06/26/1980') INSERT INTO Employees Values(4, 'Sarah' , 'Carlos' , 'London' , '08/12/1986') INSERT INTO Employees Values(5, 'Garcia' , 'Paul' , 'Barlin' , '01/31/1996') INSERT INTO Employees Values(6, 'Brown' , 'Daniel' , 'New York' , '09/22/1988') INSERT INTO Employees Values(7, 'Williams' , 'James' , 'New York' , '10/25/1981')
3. Go back to resource group and click on the storage account and create a blob container named “Employee“
Now, we are ready with source datastore which is SQL Database and destination datastore which is blob container.
Create Linked Services and Datasets
- Open Data Factory Resource and click on “Author and Monitor“, wait for data factory to open.
- Click on Pencil Icon on the left menu.
Before creating the actual pipeline, you have to create connections using linked services and datasets. Let’s create two linked services, one is for MSSQL database and second is for Storage Account.
- Click on the “Connections > Linked Service > New” and choose Azure SQL Database, enter the required details and “Test Connection“
2. Click on the “Connections > Linked Service > New” and choose Azure Blob Storage, enter the required details and “Test Connection“
Now, our linked services are ready to connect source and destination datastore.
Let’s create two new datasets using linked services which we have created, one for MSSQL database and second for Storage Account respectively.
Navigate to Datasets > New dataset. This opens a window to New Dataset, select Azure SQL DataBase. Give Name, select Linked Service and Table Name “Employees” and click OK.
Verify the dataset by navigating connection tab of dataset and clicking on Preview Data.
Now, create another new dataset but this time select Azure Blob Storage > Continue > Delimited Text > Continue. Give Name, select Linked Service, File Path “employee” and click OK.
Create Copy Activity Pipeline
To create new pipeline, navigate to Pipelines > New Pipeline, which will open pipeline template.
Navigate to Activities > Move & Transform > Copy Data. Drag and Drop Copy Data activity to the template area. The Copy Data activity requires Source dataset and Sink dataset so select the only listed dataset in Source and Sink tabs.
Now you pipeline is ready to perform Copy Activity.
To make sure everything is fine, click Validate then click on Debug to run the pipeline. If pipeline is executed successfully you will get success message.
Note : You need to allow Data Factory Server IP address in the Firewall Settings of Azure SQL Server to debug.
Navigate to Storage Account > Containers > employee, you will find there is one employee.csv file has been create by data factory pipeline.
So this completes, Data Factory pipeline and it’s components end to end integration. Hope this gives understanding of Data Factory Pipeline.
If you have any suggestions/feedback, please put them in the comment box.
Happy Learning 🙂