Exploring Azure, DevOps and Software Development

Extracting GZip & Tar Files Natively in .NET Without External Libraries

2023-06-24T12:00:00.000Z

Introduction

Imagine being in a scenario where a file of type .tar.gz lands in your Azure Blob Storage container. This file, when uncompressed, yields a collection of individual files. The trigger event for the arrival of this file is an Azure function, which springs into action, decompressing the contents and transferring them into a different container.

In this context, a team may instinctively reach out for a robust library like SharpZipLib. However, what if there is a mandate to accomplish this without external dependencies? This becomes a reality with .NET 7.

In .NET 7, native support for Tar files has been introduced, and GZip is catered to via System.IO.Compression. This means we can decompress a .tar.gz file natively in .NET 7, bypassing any need for external libraries.

This post will walk you through this process, providing a practical example using .NET 7 to show how this can be achieved.

.NET 7: Native TAR Support

As of .NET 7, the System.Formats.Tar namespace was introduced to deal with TAR files, adding to the toolkit of .NET developers:

System.Formats.Tar.TarFile to pack a directory into a TAR file or extract a TAR file to a directory
System.Formats.Tar.TarReader to read a TAR file
System.Formats.Tar.TarWriter to write a TAR file

These new capabilities significantly simplify the process of working with TAR files in .NET. Lets dive in an have a look at a code sample that demonstrates how to extract a .tar.gz file natively in .NET 7.

A Simple Example In .NET 7

Below is an example demonstrating the extraction of a .tar.gz file natively in .NET 7 in a simple console app to extract the contents of a .tar.gz file to a directory

using System;
using System.IO;
using System.IO.Compression;
using System.Formats.Tar;

class Program
{
    static void Main(string[] args)
    {
        string sourceTarGzFilePath = @"C:\_Temp\test.tar.gz";
        string targetDirectory = @"C:\_Temp\ExtractedFiles\";

        string tarFilePath = Path.ChangeExtension(sourceTarGzFilePath, ".tar");

        Directory.CreateDirectory(targetDirectory);

        // Decompress the .gz file
        using (FileStream originalFileStream = File.OpenRead(sourceTarGzFilePath))
        {
            using (FileStream decompressedFileStream = File.Create(tarFilePath))
            {
                using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
                {
                    decompressionStream.CopyTo(decompressedFileStream);
                }
            }
        }

        // Extract the .tar file
        using (FileStream tarStream = File.OpenRead(tarFilePath))
        {
            using (TarReader tarReader = new TarReader(tarStream))
            {
                TarEntry entry;
                while ((entry = tarReader.GetNextEntryAsync().Result) != null)
                {
                    if (entry.EntryType is TarEntryType.SymbolicLink or TarEntryType.HardLink or TarEntryType.GlobalExtendedAttributes)
                    {
                        continue;
                    }

                    Console.WriteLine($"Extracting {entry.Name}");
                    entry.ExtractToFileAsync(Path.Combine(targetDirectory, entry.Name), true).Wait();
                }
            }
        }

        // Delete the temporary .tar file
        File.Delete(tarFilePath);

        Console.WriteLine("Extraction Completed");
    }
}

You can also find this on GitHub Gist.

Wrapping Up

The introduction of System.Formats.Tar in .NET 7 marks a significant milestone for developers dealing with .tar.gz files. It provides us with the ability to decompress these file types natively, without relying on external libraries. This functionality is a game-changer as it reduces complexity, minimizes external dependencies, and enhances the versatility of .NET applications.

The new namespace System.Formats.Tar, along with the established System.IO.Compression, effectively handle TAR and GZip files. This considerably simplifies the process, making the .NET environment more self-contained and versatile.

References

Thumbnail image [was taken from the DotNet brand repo]https://github.com/dotnet/brand)
Main image generated by [was taken from the DotNet brand repo]https://github.com/dotnet/brand)

Unzipping and Shuffling GBs of Data Using Azure Functions

2023-05-18T12:00:00.000Z

Consider this situation: you have a zip file stored in an Azure Blob Storage container (or any other location for that matter). This isn’t just any zip file; it’s large, containing gigabytes of data. It could be big data sets for your machine learning projects, log files, media files, or backups. The specific content isn’t the focus - the size is.

The task? We need to unzip this massive file(s) and relocate its contents to a different Azure Blob storage container. This task might seem daunting, especially considering the size of the file and the potential number of files that might be housed within it.

Why do we need to do this? The use cases are numerous. Handling large data sets, moving data for analysis, making backups more accessible - these are just a few examples. The key here is that we’re looking for a scalable and reliable solution to handle this task efficiently.

Azure Data Factory is arguably a better fit for this sort of task, but In this blog post, we will specifically demonstrate how to establish this process using Azure Functions. Specifically we will try to achieve this within the constraints of the Consumption plan tier, where the maximum memory is capped at 1.5GB, with the supporting roles of Azure CLI and PowerShell in our setup.

Setting Up Our Azure Environment

Before we dive into scripting and code, we need to set the stage - that means setting up our Azure environment. We’re going to create a storage account with two containers, one for our Zipped files and the other for Unzipped files.

To create this setup, we’ll be using the Azure CLI. Why? Because it’s efficient and lets us script out the whole process if we need to do it again in the future.

Install Azure CLI: If you haven’t already installed Azure CLI on your local machine, you can get it from here.
Login to Azure: Open your terminal and type the following command to login to your Azure account. You’ll be prompted to enter your credentials.
1
az login
Create a Resource Group: We’ll need a Resource Group to keep our resources organized. We’ll call this rg-function-app-unzip-test and create it in the eastus location (you can ofcourse choose which ever region you like).
1
az group create --name rg-function-app-unzip-test --location eastus

Create a Storage Account: Next, we’ll create a storage account within our Resource Group. We’ll name it unziptststorageacct.

1	az storage account create --name unziptststorageacct --resource-group rg-function-app-unzip-test --location eastus --sku Standard_LRS

Create the Blob Containers: Finally, we’ll create our two containers, ‘Zipped’ and ‘Unzipped’ in the unziptststorageacct storage account.
1
2
az storage container create --name zipped --account-name unziptststorageacct
az storage container create --name unzipped --account-name unziptststorageacct
Now your Azure environment is ready with the specific resource group and storage account names you provided! We’ve got our storage account unziptststorageacct and two containers ‘Zipped’ and ‘Unzipped’ set up for our operations. The next step is to create our zip file.

Concocting Our Data With PowerShell

Our next task is to create a large zip file filled with multiple 100MB files, all brimming with random text. In a real world scenario you would already have these large files, but since we are simulating lets use PowerShell to create them.

If you already have an existing zip file with large’ish files for testing, you can skip this step and use that file instead.

# Set the number of files we want to create
$fileCount = 10

# The path where you want to create the TestFiles directory
$directory = "C:\_Temp\TestFiles"

# Create a new directory for our files if it doesn't already exist
if(-not (Test-Path -Path $directory)){
    New-Item -ItemType Directory -Path $directory
}

# Loop through and create our files
for ($i=1; $i -le $fileCount; $i++){
    # Generate a 100MB file filled with random text and save it in our new directory
    $fileContent = New-Object byte[] 104857600
    (New-Object Random).NextBytes($fileContent)
    [System.IO.File]::WriteAllBytes("$directory\File$i.txt", $fileContent)
}

# Now that we have all our files, let's zip them up
Compress-Archive -Path "$directory\*" -DestinationPath "$directory.zip"

This is a simple script that is creating 10 files, each 100MB in size, and then zipping them up into a single file. The resulting zip file should be around the 1GB in size.

Incase you are wondering how we end up with a 1GB+ file by compressing 1GB worth of data? we are generating files filled with random bytes. Compression algorithms work by finding and eliminating redundancy in the data. Since random data has no redundancy, it cannot be compressed. In fact, trying to compress random data can even result in output that is slightly larger than the input, due to the overhead of the compression format.

We’ll use this file to test our Azure Function.

Azure Function To Unzip

We’re going to create a Function that magically springs into action the moment a blob (our zipped file) lands in the ‘Zipped’ container. This function will stream the data, unzip the files, and stores them neatly as individual files in the ‘Unzipped’ container.

Before we begin, ensure that you’ve installed the Azure Functions Core Tools locally. You’d also need the Azure Functions Extension for Visual Studio Code.

First lets use the CLI to create our consumption plan function app. We’ll call it unzipfunctionapp and use the unziptststorageacct storage account we created earlier. We’ll also specify the runtime as dotnet and the functions version as 4. We are using the consumption plan to demonstrate that this solution can work within the constraints of the consumption plan, where the maximum memory is capped at 1.5GB.

az functionapp create --resource-group rg-function-app-unzip-test --consumption-plan-location eastus --runtime dotnet --functions-version 4 --name unzipfunctionapp123 --storage-account unziptststorageacct

You might need to change the function name in the example about from ‘unzipfunctionapp123’. This could already be taken; this is because, Azure function app name must have Globally unique name.
When you create Azure function app, you specify the name which becomes part of URL .azurewebsites.net

If the function app name is already taken you will get an error ‘Website with given name unzipfunctionapp already exists.’ when you run the cli command above.

Now that we have a consumption plan function infra, lets see the full code that will do the actual task of unzipping and uploading
There are two code samples and both are quite similar in their basic approach. They both handle the data in a streaming manner, which allows them to deal with large files without consuming a lot of memory.

However, there are some differences in the details of how they handle the streaming, which may have implications for their performance and resource usage:

The first code sample uses the ZipArchive class from the .NET Framework, which provides a high-level, user-friendly interface for dealing with zip files. The second code sample uses the ZipInputStream class from the SharpZipLib library, which provides a lower-level, more flexible interface.
In the first code sample, the ZipArchive automatically takes care of reading from the blob stream and unzipping the data. It provides an Open method for each entry in the zip file, which returns a stream that you can read the unzipped data from. In the second code sample, you manually read from the ZipInputStream and write to the blob stream using the StreamUtils.Copy method.
The second code sample manually handles the buffer size with new byte[4096] for copying data from the zip input stream to the blob output stream. In contrast, the first code sample relies on the default buffer size provided by the UploadFromStreamAsync method.

Memory wise both are similar (i.e.: they don’t download the entire zip file into memory), but the first script takes around 20 minutes to process a 1GB zip file (with 10 * 100 MB files), whereas the second script takes about 10 minutes for the same 1GB zip file. This mainly comes down to setting the custom buffer size and the optimizations in the SharpZipLib library

First script has the benefit of not importing any custom library, but cant not run on an Azure consumption plan, at the time of this writing, consumption plan has a max 10 minute runtime.
Second script can potentially run on a consumption plan, but comes at a cost of having to import a 3rd party library.

References

Thumbnail image was taken from the Azure site
Main image generated by DALL-E

Azure DevTest Labs Policies

2023-01-31T11:00:00.000Z

Azure DevTest Labs offers a powerful cloud-based development workstation environment and great alternative to a local development workstation/laptop when it comes to software development. This blog post is not so much talking about the benefits of DevTest Lab, but more about how to create policies for DevTest Labs using Bicep. Although there is a good support for deploying DevTest labs with Bicep, there is little to no documentation when it comes to creating policies for DevTest Labs in Bicep. In this blog post, we will focus on creating policies for DevTest Labs using Bicep and how to go about doing this.

A Brief Overview of Azure DevTest Labs

Azure DevTest Labs is a managed service that enables developers to quickly create, manage, and share development and test environments. It provides a range of features and tools designed to streamline the development process, minimize costs, and improve overall productivity. By leveraging the power of the cloud, developers can easily spin up virtual machines (VMs) pre-configured with the necessary tools, frameworks, and software needed for their projects.

Existing Documentation Limitations

While the existing documentation covers various aspects of Azure DevTest Labs, it lacks clear guidance on setting up policies with DevTest Labs in Bicep. This blog post aims to address that gap by providing a Bicep script for creating a DevTest Lab and applying policies to it. Shout out to my colleague Illian Y for persisting and not giving up and finding a away around undocumented features and showing me.

Existing Documentation For Creating a DevTest Lab

The existing documentation for creating a DevTest Lab is pretty good, but when it comes to creating policies for DevTest Lab this is where the documentation falls short. The documentation does not provide a Bicep script for creating policies for DevTest Labs.

Vanilla DevTest Lab

resource lab 'Microsoft.DevTestLab/labs@2018-09-15' = {
  name: 'testLab'
  location: 'australiacentral'
  tags: {
    tagName1: 'test-tag'
    tagName2: 'test-tag1'
  }
  properties: {
    environmentPermission: 'Contributor'
    labStorageType: 'Premium'
    mandatoryArtifactsResourceIdsLinux: []
    mandatoryArtifactsResourceIdsWindows: []
    premiumDataDisks: 'Disabled'
    announcement: {
      enabled: 'Disabled'
      expired: false
    }
    support: {
      enabled: 'Enabled'
      markdown: 'Test'
    }
  }
}

Creating Policies for DevTest Labs in Bicep

The documentation states all the possible policies that can be created under the fact name in PolicyProperties

Below is a list of three of those policies that can be created in Bicep.

Allowed VM Sizes
Allowed VMs Per User
Allowed Premium SSD Per User

Linking the policies to the DevTest Labs

This is the important glue that is missing from the documentation, how to link the policies to the DevTest Labs. The way to do this is to create a resource policySetParent and link it to the DevTest Labs. The policySetParent resource is then used as the parent for the policies.

resource policySetParent 'Microsoft.DevTestLab/labs/policysets@2018-09-15' existing = {
  parent: lab
  name: 'default'
}

Allowed VM Sizes

resource allowedVmSizesPolicies 'Microsoft.DevTestLab/labs/policysets/policies@2018-09-15' = {
  name: 'allowedVmSizesPolicy'
  location: location
  parent: policySetParent
  properties: {
    evaluatorType: 'AllowedValuesPolicy'
    factName: 'LabVmSize'
    status: 'Enabled'
    threshold: '["Standard_D4_v2","Standard_E4_v2"]'
  }
}

Allowed VM’s per user

resource allowedVmsPerUserPolicies 'Microsoft.DevTestLab/labs/policysets/policies@2018-09-15' = {
  name: 'allowedVmsPerUserPolicy'
  location: location
  parent: policySetParent
  properties: {
    evaluatorType: 'MaxValuePolicy'
    factName: 'UserOwnedLabVmCount'
    status: 'Enabled'
    threshold: '4'
  }
}

Allowed Premium SSD Per User

resource allowedPremiumSSDPerUserPolicies 'Microsoft.DevTestLab/labs/policysets/policies@2018-09-15' = {
  name: 'allowedPremiumSSDPerUserPolicy'
  location: location
  parent: policySetParent
  properties: {
    evaluatorType: 'MaxValuePolicy'
    factName: 'UserOwnedLabPremiumVmCount'
    status: 'Enabled'
    threshold: '4'
  }
}

References

Main & thumbnail image was taken from the Azure site

Azure Logic Apps Timeout

2022-10-19T11:00:00.000Z

Recently I got pulled into a production incident where a logic app was running for a long time (long time in this scenario was > 10 minutes), but the intention from the dev crew was they wanted this to time out in 60 seconds. These logic apps were a combination of HTTP triggers and Timer based.

Logic App Default Time Limits

First things to keep in mind are some default limits.

If its a HTTP based trigger the default timeout is around 3.9 minutes
For most others the default max run duration of a logic app is 90 days and min is 7 days

Ways To Change Defaults

With that, here are a couple of quick ways to make sure your Logic App times out and terminates within the time frame you set. Lets say if we want our Logic App to run no more than 60 seconds at max then:

You can change the setting Runtime.Backend.FlowRunTimeout from the default 90 days to 7 days (keep in mind the minimum for this setting is 7 days which is quite large, refer to this issue : https://github.com/Azure/logicapps/issues/782#issuecomment-1609008805)

PRO: This will make sure that the Logic App runs for a maximum of 7 days only (which is quite large)
CON: However this will apply to all the Logic Apps in the host/tenant, meaning if you had 15 logic apps then all 15 will have the 7 day limit

Have a branch with in the Logic App itself to control the timeout (shown in the below diagram)

PRO: You have full control of timeout per Logic App, so some can have 30 second time outs while others 60 seconds etc
CON: There will be an extra branch/logic in your logic app

Time-Out Branch In Logic App

Below is how a potential timeout out setting in a Logic App could look like. You create a “Delay” branch and set the desired time limit, in the example below its 2 minutes so if the other flow takes longer than two minutes then the delay will finish, logic app will be terminated and a cancelled status will be returned to the user in the below example. Shout out to my colleague John B for this awesome idea.

References

Main image was taken from the Azure site
Thumbnail image was taken from Azure SVG icons

Create A Multi User Experience For Single Threaded Applications Using Azure Container Apps

2022-09-11T12:00:00.000Z

How to make a single-threaded app multi-threaded? This is the scenario I faced very recently. These were legacy web app(s) written to be single-threaded; in this context single-threaded means can only serve one request at a time. I know this goes against everything that a web app should be, but it what it is.

So if we have a single threaded web app (legacy) now all of a sudden we have a requirement to support multiple users at the same time. What are our options:

Re-architect the app to be multi threaded
Find a way to simulate multi threaded behavior

Both are great options, but in this scenario option 1 was out, due to the cost involved in re-writing this app to support multi threading. So that leaves us with option 2; how can we at a cloud infra level easily simulate multi threaded behavior. Turns out if we containerize the app (in this case it was easy enough to do) we orchestrate the app such that for each http request is routed to a new container (ie: every new http request should spin up a new container and request send to it)

Options For Running Containers

So when it comes to running a container in Azure our main options are below

Here we need to orchestrate containers, ie: at a minimum for every new http request spin a new one), which means we only have two viable options, Azure Kubernetes Service (AKS) or Azure Container Apps (ACA). Both are valid options, each with their own pros/cons, with AKS its a lot more complex we will need to :

Think of networking
Think of vm’s/vm scale sets for nodes
Choose ingress controller and set up ingress rules
Identity
Plus many more, here is the baseline reference for AKS

So in short, as flexible as AKS is its not as easy as something like ACA which is a fully managed version of AKS that abstracts all the complexities of Kubernetes. So for this scenario to prove we can simulate multi threaded experience lets go ahead with ACA.

Sample Single Threaded Program

For this demo below is a simple C# DotNet app that simulates a single threaded behavior, essentially its doing a lock on a static variable which blocks the whole process for 6 seconds. So when we visit the /test endpoint we lock the whole app.

public class Program
{
    private static readonly object LockObject = new();

    public static void Main(string[] args)
    {
        var builder = WebApplication.CreateBuilder(args);

        // Add services to the container.
        builder.Services.AddAuthorization();

        builder.Services.AddEndpointsApiExplorer();
        builder.Services.AddSwaggerGen();

        builder.Services.AddApplicationInsightsTelemetry();


        var app = builder.Build();

        // Configure the HTTP request pipeline.
        if (app.Environment.IsDevelopment())
        {
            app.UseSwagger();
            app.UseSwaggerUI();
        }

        app.UseAuthorization();

        app.MapGet("/test", (HttpContext httpContext) =>
        {
            if (Monitor.TryEnter(LockObject, new TimeSpan(0, 0, 6)))
            {
                try
                {
                    Thread.Sleep(5000);
                }
                finally
                {
                    Monitor.Exit(LockObject);
                }
            }

            return ("Hello From Container: " + System.Environment.MachineName);
        });

        app.Run();
    }
}

Azure Container Apps

For this demo the easiest way to create the Azure Container Apps environment is through Visual Studio, you right click, publish and go through the menus and in the end VS will create a Container Apps Environment and deploy the code as a container to ACA.

Once this is all done, we should have a resource group like below

Azure Container Apps Scaling

Next we go to the container app (the single threaded api we just deployed) and set up a simple http scale rule that will spin up a new container for every 1 http incoming request. In the example below we set min-replicas to 0 and max to 30 this means that when there is no traffic it will scale down to 0 and at peak it will hit 30.

Testing

Now go to the url of the container app and hit it simultaneously in browser tabs, when I opened it in multiple browser tabs out of 10 tabs about 7 were served by unique containers and based on the test code above I see it being served by different container ids

Tab1: Hello From Container: single-threaded-api-app-20220731--ps4yjjp-66f4885b65-w5s6h
Tab2: Hello From Container: single-threaded-api-app-20220731--ps4yjjp-66f4885b65-gs8qf
Tab3: Hello From Container: single-threaded-api-app-20220731--ps4yjjp-66f4885b65-x7grl
etc

So its not 100% every single request goes to a brand new container, but very easily and very quickly with out too much complexity we were able to achieve a 70 - 90% of requests being served with new containers, so in essence we found a quick way to simulate a pseudo - multi threaded experience for our legacy single threaded app with out too much effort.

Application Gateway Ingress Controller For AKS

2022-08-19T12:00:00.000Z

Recently I ran into an interesting issue with an AKS cluster running 2000+ services. There is nothing wrong in running 2000+ services that’s what Kubernetes is there for, scale! but the interesting aspect that caught my attention was trying to get the Applicaiton Gateway Ingress Controller (AGIC) to ingress to all these services. I had worked with Istio and NGINX for ingress into AKS with no issues and never AGIC, so I had to try this to see where it worked well, what the advantages are and where the limitations are.

Application Gateway

Application Gateway (App Gateway) is a well-established layer 7 service that has been around for a while, some of the major features are:

URL routing
Cookie-based affinity
SSL termination
End-to-end SSL
Support for public, private, and hybrid web sites
Integrated web application firewall
Zone redundancy
Connection draining

This post isn’t focused on the App Gateway itself, it’s more about how and what it can do as an ingress controller for AKS. You can find out more about App Gateway and all abouts its features here

TLDR;

Benefits of AGIC

Direct connection to the pods without an extra hop, this results in a performance benefit up to 50% lower network latency compared to in-cluster ingress
Could make a huge difference in performance and latency sensitive applications and workloads
If going the AKS add-on route then it becomes fully managed and updated
In cluster ingress consumes and competes for AKS compute/memory resources where was with App Gateway separated from the cluster it won’t be leeching any of the AKS compute
Full benefits of the Application Gateway such as WAF, cookie-based affinity, ssl termination amongst many others

Limitations

Application Gateway has some backend limits. Backend pools are limited to 100.
Application Gateway does have a pricing implication
Routing is directly to pod IP’s rather than the ClusterIP of the service. There is a feature request open for this

Application Gateway Ingress Controller (AGIC)

AGIC went to GA around the end of 2019 and offered the possibilities of hooking up an App Gateway as an attractive alternative for ingress into an AKS cluster. Before moving any further with AGIC, we need to understand at a high-level how networking works in AKS.

There are two main network models:

Kubenet networking
- Default option for Kubernetes out of the box
- Each Node receives an IP from the Azure virtual network subnet
- Pods in the node are not associated to the Azure vnet, they are assigned an IP address from the PodIPCidr and a route table is created by AKS
Azure Container Networking Interface networking (CNI)
- Each pod itself receives an IPaddress from the Azure virtual network subnet
- Pods can be directly reached via their private IP from connected networks
- Pods can access resources in the vnet directly with out issues (e.g.: function app in the same vnet)

It’s important to note, once you create an AKS cluster with a given network model you can’t change it; you will have to create a new one. There are advantages and disadvantages in both models which are listed in detail in this link.

One key consideration to highlight is:

Kubenet - /24 IP range can support up to 251 nodes (each subnet reserves the first 3 IP addresses for management operations). Given the maximum nodes per pod in Kubenet is 110, this configuration can support a maximum of 251 * 110 = 27,610 pods
CNI - the same /24 IP range can support a maximum of 8 nodes (CNI has a max of thirty pods per node). So, this configuration can support a maximum of 240

When it comes to CNI you will have to plan for the IP addresses, you might need to a /16 range to get a bigger node count. There are also limitations with the kubenet that will need to be taken into consideration.

With the AKS networking models out of the way, let’s look at AGIC; regardless of which model is chosen, the goal for AGIC is to ingress directly to the pod, a simple representation of this can be seen below. AGIC when deployed, runs in a pod in the AKS cluster and watches for changes, when changes are detected (i.e.: a new pod has been added or existing pod removed) these IP changes are propagated to the App Gateway via the Azure Resource Manager.

If we went with the CNI networking model, then the pod would get IP address from the vnet and there would be a mapping in the App Gateway. Alternatively, with the Kubenet model this is how App Gateway will be setup, it will try to assign the same routable created by AKS to App Gateway’s subnet.

It’s important to note, whichever model you choose the App Gateway will always connect directly to the pod and this is by design.

Deploying AGIC

AGIC can be deployed in two ways either using Helm or as an AKS add-on. Each has their pros and cons, the key benefit of going via an AKS add-on will be that it will be fully managed and auto updated by Azure (i.e.: all updates, patching etc. for the AGIC will be taken care of automatically) whereas with Helm you will have to do that yourself.

Let’s go ahead and deploy a demo AKS cluster with AGIC and see it in action to understand exactly what is going on. For the sake of simplicity, this demo will be creating an AKS cluster with CNI networking model and deploying the AGIC as and AKS add-on.

Create an AKS cluster

Login and set the right subscription

1 2	az login az account set -s "your-subcription-id"

Create a new resource group

1	az group create --name agicTestResourceGroup --location eastus

Here we are creating a new AKS cluster with CNI networking model (–network-plugin azure) and we are setting up App Gateway as ingress and in this instance we are saying our App Gateway’s name is “testAppGateway” which doesn’t exist and will be created for us

Create AKS cluster

az aks create -n agicTestCluster -g agicTestResourceGroup --network-plugin azure --enable-managed-identity -a ingress-appgw --appgw-name testAppGateway --appgw-subnet-cidr "10.225.0.0/16" --generate-ssh-keys

If we go into the Azure Portal, we can see two resource groups (one of them is what we created and this where the Azure managed AKS control plane is), the other resource group (MC_agicTestResourceGroup_agicTestCluster_eastus) is where the node pool, vnet, App Gateway etc all live, this resource group gets created automatically for us as part of the az aks create command.

Deploy a sample API

Now we have the AKS cluster up and running with AGIC deployed as an add-on, let’s deploy a sample API app and set ingress through the App Gateway.

Get credentials to the AKS cluster

1	az aks get-credentials -n agicTestCluster -g agicTestResourceGroup

Deploy a sample API

1	kubectl apply -f https://gist.githubusercontent.com/Ricky-G/59eb109913bd45d3e9229f9cf0a97edc/raw/b336047feecd9fd89fbe1a9627ac385b525124fe/sample-api-aks-deployment.yaml

The above sample API deployment yaml was taken from the AGIC GitHub repo, the only change made to it was added a minimum of 10 replicas. We are saying we need 10 pods running this API. As soon as you run this you should see the app deployed as a service and 10 pods running successfully and there is a cluster-IPIP set for this (cluster-IP is an IP load balancer that Kubernetes creates, we just need to call this IP and our traffic will be forwarded to one of the 10 pods)

Now if we go to the resource group where we have the actual Application Gateway and go to backend pool, we can see there is one here created by AGIC and if we dig into the pool all the IP addresses of the 10 pods are listed here. So, we have direct ingress to the pods from the Application Gateway.

Finally, if we run the below command, we should see an ingress IP address for “aspnetapp” which is our sample API. This is the public IP of the Application Gateway, which has been wired up to ingress all the way to the pod. If we paste this IP into the browser, we can see sample aspnet site served from the pod.

1	kubectl get ingress

Right, so we have successfully ingressed all the way from public ip going via Application Gateway all the way to our pod.

Benefits of AGIC

Direct connection to the pods without an extra hop, this results in a performance benefit up to 50% lower network latency compared to in-cluster ingress
Could make a huge difference in performance and latency sensitive applications and workloads
If going the AKS add-on route then it becomes fully managed and updated
In cluster ingress consumes and competes for AKS compute/memory resources where was with App Gateway separated from the cluster it won’t be leeching any of the AKS compute
Full benefits of the Application Gateway such as WAF, cookie-based affinity, ssl termination amongst many others

Limitations

Application Gateway has some backend limits. Backend pools are limited to 100.
Application Gateway does have a pricing implication
Routing is directly to pod IP’s rather than the ClusterIP of the service. There is a feature request open for this

Closing Thoughts

Key thing to keep in mind is the backend pool limitation of 100 . If you have more than 100 “ingres-able” services, then you would need multiple Application Gateway’s to cater for this. Although it is a supported scenario and straightforward to set up multiple App Gateways for one AKS cluster, your costs will pile up.

At the start of this post, I mentioned a scenario of 2000+ services, in this case we would need 20 App Gateways; 2000 services / 100 = 20. Due to cost implications this won’t be palatable in most cases.

On the plus side you get direct connection to the pod and can shave 50% of network latency. So, in this 2000+ services in one cluster scenario we could put the App Gateway as ingress for just latency sensitive apps/API’s and use another traditional in cluster-based ingress for all the other services. This way you get the best of both words while still keeping below the App Gateway max backend pool limits.

One neat option for an in cluster-based ingress could be Web Application Routing, which is still in preview at the time of writing this. It’s a managed NGINX based solution that should work well as an in cluster-based ingress controller

References

AGIC main documentation
AGIC GitHub
Main image was taken from the Azure site and slightly modified

Deploying To IP Restricted Azure Function Apps Using GitHub Actions

2022-08-06T12:00:00.000Z

In the previous post we blocked our function app to be available only to the APIM via ip restrictions.

This secures our function app and it isn’t available publicly, any one that tries to access our function app url will get “HTTP 403 Forbidden”.

This secures our function app; now what about deploying code changes to the function app via GitHub Actions? we should be able to CI/CD to our function app, but there is a problem here. The GitHub action will fail with the same “HTTP 403 Forbidden”, this is because GitHub actions run on runners (its a hosted virtual environment), each time we run the Action we get a new runner and it can have a different ip address. So how can we get around this? do we white list the entire GitHub ip range?

GitHub’s ip ranges can change any time, so will have to keep scanning for changes to these ranges and proactively update our ip restrictions, this is not very scalable or practical. So what are other ways of getting around this? we have a couple of ways to get around this.

Possible Solutions

There are two viable solutions here

1. Use a self-hosted runner

Where you bring your own VM’s with static ip’s and whitelist these static ip’s

Pros:

Full control over your devops agents
Can optimize/reuse these agents for various CI/CD workloads for your cloud and on-prem deployments

Cons:

You have to provision and maintain your own VM’s, there will be time and effort required for this
Extra costs to maintain your own VM(s), although this could be optimized by turning them off after hours etc
You miss out on the free GitHub Action minutes you get
Extra work of provisioning VM’s, installing all the tooling for builds, maintaining and paying for them

2. Do some extra steps in the existing GitHub Actions

Use the Azure CLI
Do an az login
Grab the public ip of the GitHub runner, you could use a simple public api like the ipify api to grab the public ip of the Github Runner
Use az cli to update ip restriction to add this additional ip
Do-your-normal-Deployment
Use az cli to remove the ip added in step 4

Pros:

You use the same GitHub runner and workflow
No effort in provisioning or maintaining extra virtual machines yourself
Little bit of extra code is all that is needed

Cons:

There is a possibility that the GitHub action runner fails/crashes after doing step 4 but before it had a chance to get to step 5, you could be left with an extra ip address white listed in your app until you run the workflow again.

This post is all about how to go about doing option 2 (do some extra steps in the existing GitHub Actions), although there is one con (ie: the GitHub runner crashing during step 5 and leaving an ip address of the runner there), in my view this is a very small risk. The chances of a crash precisely at that point are low and even it does happen the risk of having the runner ip (only 1 extra ip) for a short duration until your next run happens is very low.

Show me the code

If you want to skip and just get to the code:

In the above GitHub Action it is deploying a hello world function app; it is doing a dotnet build, package and deploy. Those are all the standard bits of deploying a function app; lets go over the interesting bits

Getting the GitHub Runners public ip
Whitelisting this ip
After a successful deploy of our app, we remove the ip added in step 2

For the first step we are using a public package haythem/public-ip@v1.2 to get the ip. We can also manually do a curl our >selves to the ipify api and grab the public ip ourselves. For the purposes of this demo we will use this package.

Step 1 - getting the GitHub runners public ip

1
2
3

- name: Public IP
  id: ip
  uses: haythem/public-ip@v1.2

Next for the second step we use the az cli to add the ip address.
First we use az webapp config to set the –use-same-restrictions-for-scm-site false, here we are saying don’t apply the same restriction as the main site to the scm site
Our main site is still safe with the right ip restrictions, our scm site is now ready for changes
Next we use az functionapp config access-restriction to add the GitHub runner ip to just the scm site

Step 2 - white listing the GitHub runner’s public ip

- name: 'Allow Github Runner IpAddress'
  uses: azure/CLI@v1
  with:
    azcliversion: 2.37.0
    inlineScript: |
        az webapp config access-restriction set -g $ -n func-app-iprest-demo --use-same-restrictions-for-scm-site false
        az functionapp config access-restriction add -g $ -n func-app-iprest-demo --rule-name github_runner --action Allow --ip-address $ --priority 100 --scm-site true

Finally we remove the ip address we added from the previous step and set the scm site access the same as our main site

Step 3 - after successful deploy, remove the GitHub runner’s public ip

- name: 'Remove Github Runner IpAddress'
  uses: azure/CLI@v1
  with:
    azcliversion: 2.37.0
    inlineScript: |
        az functionapp config access-restriction remove -g $ -n func-app-iprest-demo --rule-name github_runner --scm-site true
        az webapp config access-restriction set -g $ -n func-app-iprest-demo --use-same-restrictions-for-scm-site true

Finally 👏! we can now deploy using GitHub Actions to ip restricted function apps 🙌.

References

As always a big thank you to Unsplash for providing a huge range of images for free

Cover image has been taken from https://unsplash.com/photos/842ofHC6MaI

Securing Azure Functions and Logic Apps

2022-07-31T12:00:00.000Z

Here is a scenario that I recently encountered. Imagine we are building micro-services using serverless (a mix on Azure Function Apps and Logic Apps) with APIM in the front. Lets say we went with the APIM standard instance and all the logic and function apps are going to be running on consumption plan (for cost reasons as its cheaper). This means we wont be getting any vnet capability and our function and logic apps will be exposed out to the world (remember to get vnet with APIM we have to go with the premium version, we are going APIM standard here for cost saving reasons).

So how do we restrict our function and logic apps to only go through the APIM, in another words all our function and logic apps must only go through the APIM and if anyone tries to access them directly they should be getting a “HTTP 403 Forbidden”.

Lets visualize this scenario; We have some WAF capable ingress endpoint, in this case its Azure Front Door, that is forwarding traffic to APIM which then sends the requests to the serverless apps.
Reason for having Front Door before APIM is because APIM doesn’t have WAF natively so we will need to put something in front of it that has that capability to be secure.

There are few options like Azure Firewall, Application Gateway etc, but for the purposes of this scenario we have Azure Front Door in front of APIM (and we can have an APIM policy that will only accept traffic from Azure Font Door, we wont be going in to that, we will keep it to securing our function apps to just being available via APIM for today)

Securing the function app

First we will need to get the public ip address of the APIM
White-list this address in our function app network restrictions

Getting the public ip of APIM

You can go to the APIM resource in the Azure portal and get it from there

Or you can use the CLI and run

1	az apim show --name "apim-name" --resource-group "resource-group-name"

White-listing the function app

You need to go into networking -> access restriction
Only allow the APIM ip (once you enter this, the deny all will automatically come ie: all other ip’s are denied)
Its important that the SCM site is also blocked. More about Kudu service that powers the SCM site here

What happens if you try to access this function

Now its all blocked we get a nice HTTP 403 Forbidden

What about deploying code to this function via GitHub Actions

When you try to deploy to these functions using GitHub Actions or even via Azure Devops you will get the same HTTP 403 and wont be able to deploy. This is because the GitHub runner’s ip address will be blocked; remember we are only allowing APIM in, all others are blocked.

There are a couple of ways to get around this. I talk about this in the next post, you can check it out here

References

Cover image has been taken from https://azure.microsoft.com/en-us/services/functions/#overview

Hello World 👋

2022-07-26T12:00:00.000Z

After sitting on this for a long time and wanting to blog / write down my thoughts, I’ve finally got my act together and started this. There were so many times I was asked some very good questions which I am sure not just the person asking me but a lot more would have been interested in knowing the answer/solution/thoughts around the matter. This is a way to write about that and help the wider community who are searching for similar solutions.

I regularly answer in Stack Overflow and in some cases I wrote a question and answered it myself just incase some one was looking for something similar, that wasn’t really the ideal platform to do that. There have been so many times that going through and reading other people’s blogs have helped me and unlocked me in problems that I was stuck with; this is a in a way trying to give back to the community and helping people that are on the look out for a solution for a similar problem.

How to power the blog

There were so many choices out there when it came to what frameworks and libraries to use to build the blog and what to use to host the blog.

My requirements when it came to building were simple

Easy to author posts
Easy to build
Easy to maintain
Most customizations (eg: search, ads, tags, categories etc) should come out of the box

My requirements when it came to hosting were even simpler

Has to be free
Has to be able to handle ‘some’ level of load
Easy to CI/CD

Main choices here boiled down to:

All the options were good, I really liked Hugo, it was so easy to create a site. But all of them were geared towards creating a CMS / generic site. I was looking for something that had all the things needed for a blog out of the box with out having to grab lots of plugins or write something custom.

Jekyll and GitHub pages were really good, it nailed most of the things, but I didn’t really want to go down the road of learning Jekyll just to host a blog. This left one and Hexo fit my requirements beautifully. It was a dedicated Javascript framework that has all the things I was looking for out of the box and it had 360+ themes available all community built and free.

One thing I loved about Hexo is the fact its builds the source to a static site and you can use GitHub to host the static site and use GitHub Actions to build the static site from source.

This is what I went with in the end, Hexo to build the blog. I write everything in markdown files and Hexo builds it out into a nice static site and I host it using GitHub pages as a public repo

There are some limits of hosting with GitHub Pages, the main one is the 100GB of bandwidth as a soft limit. Since this is just a static site 100GB should be plenty but if and when it comes to that I will look at putting a CDN in front.

Final Result

Hexo to build the blog into a static site
Icarus theme
GitHub Pages to host the site
Bulma to help enrich the markdown files with styling

References

As always a big thank you to Unsplash for providing a huge range of images for free

Cover image has been taken from https://unsplash.com/photos/3SIXZisims4