How to manage Azure Data Factory from DEV to PRD | by René Bremer | Aug, 2022


Using Azure DevOps and Azure Data Factory CI/CD utilities

Managing Data Factory pipelines from DEV to PRD — photo by Ricardo Gomez Angel on Unsplash

In software projects, DEV/TST/UAT environments are used to develop and test code. After that, code is released to the PRD environment. Azure Data Factory (ADFv2) projects follow the same phased approach. In this, the following ADFv2 challenges need to be taken care of:

  • ADFv2 project can only be as deployed as a whole rather than deploying a single pipeline. In this, it shall be prevented that untested code ends up in production.
  • ADFv2 supports branching and pull requests, however, it is hard to review changes in JSON. Reviewing changes can only be done in ADFv2 itself, either using a different branch in DEV or run in TST
  • ADFv2 pipelines cannot be tested in isolation, however, by fixed data sources and pytest libraries, unit and integration testing can be achieved. See my previous blog.

In this blogpost and git repoazure-data-factory-cicd-feature-release, it is discussed how a release cycle of ADFv2 can be optimized to prevent that untested code ends up in production, see also overview below.

1. Managing ADFv2 from DEV to PRD — image by author

Notice that is not required anymore to use the ADFv2 publish button to move to different environments. Instead, NPM Azure Data Factory utilities library can be leveraged to propagate to different environments following the recommended new ADFv2 CI/CD flow. In the remaining of this blog, we discuss the process to deploy to the different environments.

In the remainder of this blog, the project is deployed using the following steps:

  • 2.1 Prerequisites
  • 2.2 Setup Azure DevOps project
  • 2.3 Setup ADFv2 DEV instance
  • 2.4 Create feature branch in ADFv2 DEV

2.1 Prerequisites

The following resources are required in this tutorial:

Subsequently, go to the Azure portal and create a resource group in which all Azure resources will be deployed. This can also be done using the following Azure CLI command:

az group create -n <<your resource group>> -l <<your location>>

2.2 Create Azure DevOps project

Azure DevOps is the tool to continuously build, test, and deploy your code to any platform and cloud. Once you created a new project, click on the repository folder and select to import the following repository:

See also the picture below, in which a devops repo is created using the git repo of this blog:

2.2.1 Add repository to your Azure DevOps project, image by author

Subsequently, the Service connection is needed to access the resources in the resource group fom Azure DevOps. Go to project settings, service connection and then select Azure Resource Manager, see also picture below.

2.2.2 Add repository to your Azure DevOps project, image by author

Select Service Principal Authentication and limit scope to your resource group which you created earlier, see also picture below.

2.2.3 Limit scope to resource group, image by author

2.3 Setup ADFv2 DEV instance

In this paragraph, the Azure Data Factory is created that is used for development. This ADFv2 DEV instance shall use the Azure DevOps git reop that was created in step 2.2.

  • In this link, it is explained how to create an Azure Data Factory instance in the portal
  • In this link, it is explained how a code repository can be aded to ADFv2. Make sure the devops project git repo of 2.2 is added.

Alternatively, Azure CLI can also be used to create an data factory including adding code repository, see here. See below for a successfully deployed ADFv2 instance that is linked to Azure DevOps git repo

2.3 Azure Data Factory instance linked to Azure DevOps

2.4 Create feature branch in ADFv2 DEV

After everything is setup, development can start. A best practice in software development is to create development branches from the main branch. In this development branch, features are created by developers.

In this blog, it is key that new branches are created in Azure DevOps rather than ADFv2 itself. This is because that Azure DevOps Pipeline yml file shall also be included in the new branch (ADFv2 does not do that). Go to your Azure DevOps project, select branches and create new branch, see also below.

2.4 Create new branch in Azure DevOps

A developer can now also ind the branch in ADFv2 and can start coding in the branch. After a developer finished his coding is his branch, it shall be reviewed by other developers and tested. This is discussed in the next sub paragraph.

In this chapter, a feature branch is reviewed and deployed in ADFv2 instance for testing. The following steps are executed:

  • 3.1 Create Pull Request (PR)
  • 3.2 Deploy feature branch to a new ADFv2 TST instance

3.1 Create pull request (PR)

After a developer finished its branch, a PR shall be created such that other people can review. Go to the Azure DevOps project, go to branches, select your feature branch and select new pull request, see below

4.1 Create pull request

Fill in the form and fill in the names of developers that shall review the changes. An email will be sent to these developers to notify them.

3.2 Deploy feature branch to a new ADFv2 TST instance

As discussed in the introduction, it is hard to review changes in ADFv2 pipelines from JSON. A PR shall always be reviewed in ADFv2 instance itself. Three different ways to do this are as follows:

  • Go to ADFv2 DEV instance, select the branch and review the changes. Testing can be a challenge, since branch is still in DEV environment
  • Deploy changes to a generic, existing ADFv2 TST instance. Deploying code to existing pipeline will overwrite the existing ADFv2 instance.
  • Deploy changes to a new, dedicated ADFv2 TST instance having the name of the feature branch. Changes can also be tested in isolation.

The new, dedicated ADFV2 TST instance approach is taken in this sub paragraph. Go to your Azure DevOps project, to your repo, find azure-pipelines-release.yml and adapt the values to point to your workspace, see also below.

variables:
adfdev : "<<your dev ADFv2 instance>>"
adftst : $(Build.SourceBranchName)
subiddev: "<<your dev subscription id>>"
subidtst: "<<your tst subscription id>>"
rgdev : "<<your dev resource group>>"
rgtst : "<<your tst resource group>>"
location : "<<your location>>"
AzureServiceConnectionId: "<<your AzureServiceConnectionId>>"

Now select Pipelines in your DevOps Project and click “New pipeline”. Go to the wizard, select the Azure Repos Git and the git repo you created earlier. In the tab configure, choose “Existing Azure Pipelines YAML file” and then azure-pipelines-release.ymlthat can be found in the git repo, see also below.

Once the pipeline is created, it is run immediatelly, and when ran successfully, a new data factory instance is created with the name of the feature branch. After a the feature branch is reviewed and tested correctly, it can be decided to merge it to main again. This will be discussed in the next paragraph.

In the remainder of this blog, the project is deployed using the following steps:

  • 4.1 Merge feature branch to main branch
  • 4.2 Deploy main branch to ADFv2 UAT
  • 4.3 Deploy main branch to ADFv2 PRD

4.1 Merge feature branch to main branch

After a the feature branch is reviewed and tested correctly, it shall be merged with the main again. Select the pull request, add a comment and mark the PR as complete, see also image below.

4.1 Close PR, merge feature branch to main branc

The feature branch is now merged to the main branch. As a next step, regression test shall be run in the UAT environment on the main branch. That will be discussed in the next paragraph.

4.2 Deploy main branch to ADFv2 UAT

The main branch is always to deploy to PRD. This is done to make sure that only 1 code base exists that is used in production. However, after merging a feature to main done in chapter, a couple of regression tests shall be run first to make sure the main branch does not contain conflicting changes. In this paragraph, the main branch deployed to UAT.

Go to your Azure DevOps project, to your repo, find azure-pipelines-release.txt and adapt the values to point to your workspace, see also below

variables:
adfdev : "<<your dev ADFv2 instance>>"
adfuat : "<<your uat ADFv2 instance>>"
subiddev: "<<your dev subscription id>>"
subiduat: "<<your uat subscription id>>"
rgdev : "<<your dev resource group>>"
rguat : "<<your uat resource group>>"
location : "<<your location>>"
AzureServiceConnectionId: "<<your AzureServiceConnectionId>>"

Now take similar steps described in paragraph 3.2 to create and run the pipeline. Two additional remarks:

  • azure-pipelines-release.yml contains the following snippet of code: trigger: main . This implies that whenever there is a change in the main branch (e.g. when a feature branch is merged), the pipeline is run and tests can be executed.
  • Creating a Azure DevOps pipeline to deploy to UAT only needs to be once

When the tests are executed successfully, the code is ready to be deployed in PRD. This is discussed in the next chapter

4.3 Deploy main branch to ADFv2 PRD

When regression tests were successful in UAT, the main branch is ready to deployed in PRD. This can simply be done by adding an deploy ADFv2 to PRD task in the pipeline deployed in 4.2. Two addition remarks on regression testing and deployment to PRD:

In a typical software project, DEV/TST/UAT environments are used to develop and test code. After that, it is released to the PRD environment. Azure Data Factory (ADFv2) pipelines follow the same release cycle. However, an ADFv2 project can only be deployed as a whole and there is a risk that untested code is deployed into production. In this blog and git repoazure-data-factory-cicd-feature-release , it is described how ADFv2 releases can be managed, see also overview below.

5. Managing ADFv2 from DEV to PRD — image by author


Using Azure DevOps and Azure Data Factory CI/CD utilities

Managing Data Factory pipelines from DEV to PRD — photo by Ricardo Gomez Angel on Unsplash

In software projects, DEV/TST/UAT environments are used to develop and test code. After that, code is released to the PRD environment. Azure Data Factory (ADFv2) projects follow the same phased approach. In this, the following ADFv2 challenges need to be taken care of:

  • ADFv2 project can only be as deployed as a whole rather than deploying a single pipeline. In this, it shall be prevented that untested code ends up in production.
  • ADFv2 supports branching and pull requests, however, it is hard to review changes in JSON. Reviewing changes can only be done in ADFv2 itself, either using a different branch in DEV or run in TST
  • ADFv2 pipelines cannot be tested in isolation, however, by fixed data sources and pytest libraries, unit and integration testing can be achieved. See my previous blog.

In this blogpost and git repoazure-data-factory-cicd-feature-release, it is discussed how a release cycle of ADFv2 can be optimized to prevent that untested code ends up in production, see also overview below.

1. Managing ADFv2 from DEV to PRD — image by author

Notice that is not required anymore to use the ADFv2 publish button to move to different environments. Instead, NPM Azure Data Factory utilities library can be leveraged to propagate to different environments following the recommended new ADFv2 CI/CD flow. In the remaining of this blog, we discuss the process to deploy to the different environments.

In the remainder of this blog, the project is deployed using the following steps:

  • 2.1 Prerequisites
  • 2.2 Setup Azure DevOps project
  • 2.3 Setup ADFv2 DEV instance
  • 2.4 Create feature branch in ADFv2 DEV

2.1 Prerequisites

The following resources are required in this tutorial:

Subsequently, go to the Azure portal and create a resource group in which all Azure resources will be deployed. This can also be done using the following Azure CLI command:

az group create -n <<your resource group>> -l <<your location>>

2.2 Create Azure DevOps project

Azure DevOps is the tool to continuously build, test, and deploy your code to any platform and cloud. Once you created a new project, click on the repository folder and select to import the following repository:

See also the picture below, in which a devops repo is created using the git repo of this blog:

2.2.1 Add repository to your Azure DevOps project, image by author

Subsequently, the Service connection is needed to access the resources in the resource group fom Azure DevOps. Go to project settings, service connection and then select Azure Resource Manager, see also picture below.

2.2.2 Add repository to your Azure DevOps project, image by author

Select Service Principal Authentication and limit scope to your resource group which you created earlier, see also picture below.

2.2.3 Limit scope to resource group, image by author

2.3 Setup ADFv2 DEV instance

In this paragraph, the Azure Data Factory is created that is used for development. This ADFv2 DEV instance shall use the Azure DevOps git reop that was created in step 2.2.

  • In this link, it is explained how to create an Azure Data Factory instance in the portal
  • In this link, it is explained how a code repository can be aded to ADFv2. Make sure the devops project git repo of 2.2 is added.

Alternatively, Azure CLI can also be used to create an data factory including adding code repository, see here. See below for a successfully deployed ADFv2 instance that is linked to Azure DevOps git repo

2.3 Azure Data Factory instance linked to Azure DevOps

2.4 Create feature branch in ADFv2 DEV

After everything is setup, development can start. A best practice in software development is to create development branches from the main branch. In this development branch, features are created by developers.

In this blog, it is key that new branches are created in Azure DevOps rather than ADFv2 itself. This is because that Azure DevOps Pipeline yml file shall also be included in the new branch (ADFv2 does not do that). Go to your Azure DevOps project, select branches and create new branch, see also below.

2.4 Create new branch in Azure DevOps

A developer can now also ind the branch in ADFv2 and can start coding in the branch. After a developer finished his coding is his branch, it shall be reviewed by other developers and tested. This is discussed in the next sub paragraph.

In this chapter, a feature branch is reviewed and deployed in ADFv2 instance for testing. The following steps are executed:

  • 3.1 Create Pull Request (PR)
  • 3.2 Deploy feature branch to a new ADFv2 TST instance

3.1 Create pull request (PR)

After a developer finished its branch, a PR shall be created such that other people can review. Go to the Azure DevOps project, go to branches, select your feature branch and select new pull request, see below

4.1 Create pull request

Fill in the form and fill in the names of developers that shall review the changes. An email will be sent to these developers to notify them.

3.2 Deploy feature branch to a new ADFv2 TST instance

As discussed in the introduction, it is hard to review changes in ADFv2 pipelines from JSON. A PR shall always be reviewed in ADFv2 instance itself. Three different ways to do this are as follows:

  • Go to ADFv2 DEV instance, select the branch and review the changes. Testing can be a challenge, since branch is still in DEV environment
  • Deploy changes to a generic, existing ADFv2 TST instance. Deploying code to existing pipeline will overwrite the existing ADFv2 instance.
  • Deploy changes to a new, dedicated ADFv2 TST instance having the name of the feature branch. Changes can also be tested in isolation.

The new, dedicated ADFV2 TST instance approach is taken in this sub paragraph. Go to your Azure DevOps project, to your repo, find azure-pipelines-release.yml and adapt the values to point to your workspace, see also below.

variables:
adfdev : "<<your dev ADFv2 instance>>"
adftst : $(Build.SourceBranchName)
subiddev: "<<your dev subscription id>>"
subidtst: "<<your tst subscription id>>"
rgdev : "<<your dev resource group>>"
rgtst : "<<your tst resource group>>"
location : "<<your location>>"
AzureServiceConnectionId: "<<your AzureServiceConnectionId>>"

Now select Pipelines in your DevOps Project and click “New pipeline”. Go to the wizard, select the Azure Repos Git and the git repo you created earlier. In the tab configure, choose “Existing Azure Pipelines YAML file” and then azure-pipelines-release.ymlthat can be found in the git repo, see also below.

Once the pipeline is created, it is run immediatelly, and when ran successfully, a new data factory instance is created with the name of the feature branch. After a the feature branch is reviewed and tested correctly, it can be decided to merge it to main again. This will be discussed in the next paragraph.

In the remainder of this blog, the project is deployed using the following steps:

  • 4.1 Merge feature branch to main branch
  • 4.2 Deploy main branch to ADFv2 UAT
  • 4.3 Deploy main branch to ADFv2 PRD

4.1 Merge feature branch to main branch

After a the feature branch is reviewed and tested correctly, it shall be merged with the main again. Select the pull request, add a comment and mark the PR as complete, see also image below.

4.1 Close PR, merge feature branch to main branc

The feature branch is now merged to the main branch. As a next step, regression test shall be run in the UAT environment on the main branch. That will be discussed in the next paragraph.

4.2 Deploy main branch to ADFv2 UAT

The main branch is always to deploy to PRD. This is done to make sure that only 1 code base exists that is used in production. However, after merging a feature to main done in chapter, a couple of regression tests shall be run first to make sure the main branch does not contain conflicting changes. In this paragraph, the main branch deployed to UAT.

Go to your Azure DevOps project, to your repo, find azure-pipelines-release.txt and adapt the values to point to your workspace, see also below

variables:
adfdev : "<<your dev ADFv2 instance>>"
adfuat : "<<your uat ADFv2 instance>>"
subiddev: "<<your dev subscription id>>"
subiduat: "<<your uat subscription id>>"
rgdev : "<<your dev resource group>>"
rguat : "<<your uat resource group>>"
location : "<<your location>>"
AzureServiceConnectionId: "<<your AzureServiceConnectionId>>"

Now take similar steps described in paragraph 3.2 to create and run the pipeline. Two additional remarks:

  • azure-pipelines-release.yml contains the following snippet of code: trigger: main . This implies that whenever there is a change in the main branch (e.g. when a feature branch is merged), the pipeline is run and tests can be executed.
  • Creating a Azure DevOps pipeline to deploy to UAT only needs to be once

When the tests are executed successfully, the code is ready to be deployed in PRD. This is discussed in the next chapter

4.3 Deploy main branch to ADFv2 PRD

When regression tests were successful in UAT, the main branch is ready to deployed in PRD. This can simply be done by adding an deploy ADFv2 to PRD task in the pipeline deployed in 4.2. Two addition remarks on regression testing and deployment to PRD:

In a typical software project, DEV/TST/UAT environments are used to develop and test code. After that, it is released to the PRD environment. Azure Data Factory (ADFv2) pipelines follow the same release cycle. However, an ADFv2 project can only be deployed as a whole and there is a risk that untested code is deployed into production. In this blog and git repoazure-data-factory-cicd-feature-release , it is described how ADFv2 releases can be managed, see also overview below.

5. Managing ADFv2 from DEV to PRD — image by author

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai Newsartificial intelligenceAugazureBremerDatadevFactoryManagePRDRenéTechnology
Comments (0)
Add Comment