Azure In-VM Instance Metadata & Managed Service Identities with ARM Templates and GoLang combined – inside/out

A lot changed since my last blog post… we had a great and beatiful summer with an awesome vacation and I am now part of the Azure Customer Advisory Team which is the customer-facing part from Azure Engineering. So, I finally ended up in Jason Zander’s part of Microsoft, the person who’s responsible for Azure, itself. That means I am now involved in the most complex Azure-projects we run with customers and not dedicated to SAP, only, anymore. Although I still work with SAP a lot.

Now, in the meantime a lot of Azure tech stuff expanded as well. In this post I want to focus on two specific features – the In-VM Instance Metadata Service and the Managed Service Identity (in short, MSI) which we recently started using in a customer project even before MSI got publicly available and announced.

I’ve posted about the need for in-VM instance metadata as well as an approach for allowing Virtual Machines to perform automated management operations in a previous blog-post, already. While what I wrote back then is technically still possible, MSI and in-VM Instance Metadata are the recommendation for such scenarios right now. So, you can consider this as the long-awaited follow-up post for this previous one!

Recap the scenario

The scneario I posted about back then was about virtual machines that need to read data about themselves and also modifying configuration settings about themselves through Azure Resource Manager REST API calls. In the meantime, that very same customer I blogged about back then came with a new scenario that requires a similar capability to us.

Essentially, in that scenario a VM needed to capture it’s own IP addresses and determine the IP addresses of its peers for performing automated configurations of networking routes and keepalived settings for an HA setup (more details to follow in a separate blog post).

All of this is possible through a combined use of the new Azure in-VM instance metadata service and the Managed Service Identity!

In-VM Instance Metadata in a Nutshell

This is really nothing special, AWS and other cloud providers have it for ages, already. It essentially gives applications and scripts running inside of the VM an HTTP endpoint available from within the VM, only. This endpoint returns fundamental basic details about a Virtual Machine such as its name, network configurations, unqiue identifiers etc. For Azure Virtual Machines, this endpoint is available on and returns JSON-formatted data about the virtual machine that looks similar to the following:

myuser@mylinuxvm:~$ curl -H Metadata:true "" | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   515  100   515    0     0   115k      0 --:--:-- --:--:-- --:--:--  125k
  "compute": {
    "location": "westeurope",
    "name": "mylinuxvm",
    "offer": "UbuntuServer",
    "osType": "Linux",
    "platformFaultDomain": "0",
    "platformUpdateDomain": "0",
    "publisher": "Canonical",
    "sku": "16.04-LTS",
    "version": "16.04.201708151",
    "vmId": "d7......-9...-4..4-b..b-2..........4",
    "vmSize": "Standard_D2s_v3"
  "network": {
    "interface": [
        "ipv4": {
          "ipAddress": [
              "privateIpAddress": "",
              "publicIpAddress": "xx.xx.xx.xx"
          "subnet": [
              "address": "",
              "prefix": "24"
        "ipv6": {
          "ipAddress": []
        "macAddress": "00........B3"

It’s a simple REST-service only accessible to anything that runs inside of the VM. All you need to take care off is ensuring, that you pass the Metadata: true HTTP-header when calling into the service. The call above shows the fundamental basics, only. There’s much more the service provides, for a complete look, review the documentation.

Managed Service Identities (MSI)

The in-VM instance metadata service is great if you need to query details about the VM, itself. What what if you need to query more? For example, which other servers are available in the same resource group to be able to configure keepalived for automatically configuring an HA-setup with Unicast instead of multi-cast for the availability pings? That’s especially important on Azure, since Multi-Cast is blocked by the VNET infrastructure. Finding out which other servers are available in the same resource group is not possible through the in-VM instance metadata service!

In my previous blog post about this topic when Instance-Metadata and MSI where not available, yet, the scenario was for a Marketplace Image to open up ports on Azure NSGs as part of an automated process after the user entered more details into a post-provisioning registration application that ran inside of the VM. Again, such actions do require access to the Azure Resource Manager REST APIs… and that, in turn, requires to authenticate against Azure Active Directory with a valid principal.

In the past, you had to manually create a Service Principal for such actions and assign permissions in the Azure Subscription for it. Then, from within the VM, you had to sign-in against Azure AD from your script or application using this Service Principal to gain access to the Azure Resource Manager REST APIs. This introduced a very delicate challenge: where would you store the credentials for being able to sign-in with the Service Principal from within the VM!?

With Managed Service Identities, these kind of scenarios become way easier to implement and removes the challenge for you to manage secrets in Virtual Machines for Service Principals. With MSIs activated, all sorts of Azure Service Instances can get identities assigned which are fully managed by Azure through it’s Microsoft.ManagedIdentity resource provider.

MSIs can be enabled on Virtual Machines, but also other types of Services as you can read in the documentation. You can enable it through the portal, via an ARM template or with PowerShell or the Azure CLI!

Enabling Managed Service Identities

There are two pieces to it, which are getting more visible when you enable MSIs through:

  • Assigning an MSI to a resource which essentially results in the creation of a “managed service principal” for an Azure Resource such as a Virtual Machine that is made available to this Azure Resource, only!
  • Making tokens available to the respective resource for which the Managed Service Identity has been created. For VMs, this happens through a Virtual Machine Extension called the ManagedIdentityExtensionForWindows or ManagedIdentityExtensionForLinux, respectively. When the extension is enabled for a virtual machine, any software running inside of the VM can request a token which is created as a result of an authentication against Azure AD with the MSI credentials. You don’t have to take care about those credentials since they are managed by the MSI infrastructure for you.

Once you have an MSI attached to a Virtual Machine (or another Azure Resource), you can to assign permissions to this identity for performing management operations against resources in your Azure subscriptions. The following screen shot shows this in the portal:

Assigning Permissions to a Managed Service Identity

If you need to assign the permissions via CLI, then you need to get the object IDs and App IDs for the service principals which are managed for you behind the scenes. Below is an excerpt of Azure CLI commands and results showing what you need to do!

mszcool@dev:~$ az vm show --resource-group LinuxHaWithUdrs --name lxHaServerVm0 --out json
  "id": "/subscriptions/a...fe/resourceGroups/LinuxHaWithUdrs/providers/Microsoft.Compute/virtualMachines/lxHaServerVm0",
  "identity": {
    "principalId": "f3....26d",
    "tenantId": "72....47",
    "type": "SystemAssigned"
  "instanceView": null,
  "licenseType": null,
  "location": "westeurope",
  "name": "lxHaServerVm0",
  "networkProfile": {
  "osProfile": {
  "plan": null,
  "provisioningState": "Succeeded",
  "resourceGroup": "LinuxHaWithUdrs",
  "resources": [
  "storageProfile": {
  "tags": {},
  "type": "Microsoft.Compute/virtualMachines",
  "vmId": "52.....6bf"
mszcool@dev:~$ az ad sp show --id f3....26d
AppId             DisplayName       ObjectId          ObjectType
----------------  ----------------  ----------------  ----------------
8b............f1  RN_lxHaServerVm0  f3............6d  ServicePrincipal

As you can see, when you get the VM object through ARM, it contains a new section called identity which contains all the details about the managed service identity you need to retrieve further details from Azure AD (above also by using the CLI).

That information can be used for things such as creating custom roles with permissions and then assigning the MSI to this custom role instead of assigning explicit permissions.

And end-2-end example

As I’ve mentioned before, one of the main use cases – so also for my customer – to use these assets combined is all about VMs that need to retrieve (and modify) details about themselves and peers in a joint-deployment. In an simplified example I wanted to demonstrate the fundamental the basic mechanics of the Instance Metadata Service and the Managed Service Identity so that you understrand, how you can make use of them in your own scripts and applications.

The sample builds the foundation for the scenarios I’ve explained earlier (VMs getting infos about themselves and their peers). Rather than trying to hit it all with a single post, you can expect more complex scenario posts later on that make use of the mechanics explained in this post.

Essentially, the sample creates an infrastructure with a jump-box and a set of servers as shown in the following Azure Network Watcher topology diagram.

All of the code is available on my GitHub repository for review:

Network Watcher Topology

On each of the servers, a simple GO-based REST API runs which allows to show the instance metadata of the server itself as well as get all the other servers in the same machine. The servers are exposed through an Azure Load Balancer using NAT so that every server can be accessed, individually on a port to be able to call into specific servers. Note that I’ve set this up this way for demo-purposes, only so that you easily can access each server and examine its instance metadata and its output of getting details about its peers individually.

In a real-world environment I could rarely or not at all think about scenarios to expose instance metadata or data about peers to the public, directly. So, this is for demo-purposes, only, I wanted to re-iterated on that.

Assigning MSIs to the Servers and giving them permissions

For the sample, I used ARM templates to assign MSIs to the individual Server VMs and enable the respective MSI VM extension so that an application running inside of the respective VM can get a token for accessing resources under the identity of the VM it’s running in – the excerpt is from the azuredeploy.json template on my GitHub repository.

    "apiVersion": "[variables('computeAPIVersion')]",
    "type": "Microsoft.Compute/virtualMachines",
    "copy": {
        "name": "serverVmCopy",
        "count": "[parameters('serverCount')]"
    "name": "[concat(variables('serverVmNamePrefix'), copyIndex())]",
    "location": "[parameters('location')]",
    "identity": {
        "type": "systemAssigned"
    "dependsOn": [
        "[resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName'))]",
    "properties": {
    "apiVersion": "[variables('computeAPIVersion')]",
    "type": "Microsoft.Compute/virtualMachines/extensions",
    "name": "[concat(variables('serverVmNamePrefix'),copyIndex(),'/IdentityExtension')]",
    "location": "[parameters('location')]",
    "copy": {
        "name": "serverVmMsiExtensionCopy",
        "count": "[parameters('serverCount')]"
    "dependsOn": [
        "[resourceId('Microsoft.Compute/virtualMachines', concat(variables('serverVmNamePrefix'), copyIndex()))]"
    "properties": {
        "publisher": "Microsoft.ManagedIdentity",
        "type": "ManagedIdentityExtensionForLinux",
        "typeHandlerVersion": "1.0",
        "autoUpgradeMinorVersion": true,
        "settings": {
            "port": "[variables('msiExtensionPort')]"
        "protectedSettings": {}

As you can see above, the server-VM gets a system assigned identity in the ARM template. Further down in the template, the Managed Identity Extension is activated for each server VM instance. The variable msiExtensionPort is set to 50342 in my example, which means that an application or script running inside of the VM can retrieve a token for management operations from within the VM on that port (http://localhost:50342/oauth2/token).

Taking care of RBAC

Now we have an MSI and the ability for applications to get tokens when running inside of the VM. But so far the possibilities of using that identity are limited since it does not have any permissions, yet. These are assigned through the ARM template, as well:

    "apiVersion": "[variables('authAPIVersion')]",
    "type": "Microsoft.Authorization/roleAssignments",
    "name": "[parameters('rbacGuids')[add(mul(copyIndex(),2),1)]]",
    "copy": {
        "name": "serverVmRbacDeployment",
        "count": "[parameters('serverCount')]"
    "dependsOn": [
        "[resourceId('Microsoft.Compute/virtualMachines', concat(variables('serverVmNamePrefix'), copyIndex()))]"
    "properties": {
        "roleDefinitionId": "[variables('rbacContributorRole')]",
        "principalId": "[reference(concat(resourceId('Microsoft.Compute/virtualMachines',concat(variables('serverVmNamePrefix'),copyIndex())),'/providers/Microsoft.ManagedIdentity/Identities/default'),variables('managedIdentityAPIVersion')).principalId]",
        "scope": "[resourceGroup().id]"

This assigns permissions to created MSIs for the VMs to read resources of the resource group the VMs are deployed in. To get the role definition, which is stored in the [variables('rbacContributorRole')] in my template, I had to execute an Azure CLI statement along the lines of the following:

az role definition list --query "[?properties.roleName == 'Contributor']" --out json

The next tricky bit is the name of the RBAC role assignment. Unfortunately, that needs to be a unqiue GUID. In my very simplified example, I pass in the GUIDs for the role assignments as parameters in the template:

"rbacGuids": {
    "type": "array",
    "metadata": {
        "description": "Exactly ONE UNIQUE GUID for each server VM is needed in this array for the RBAC assignments (sorry for that)! WARNING: if you want to keep this template deployment repeatable, you must generate new GUIDs for every run or delete RBAC assignments before running it, again!"
    "defaultValue": [
    "minLength": 4,
    "maxLength": 18

The reason for this is to make it simple to replace those values as part of an integrated CI/CD pipeline with every continuous build that might involve such an ARM-template deployment. I might write a separate, short post about that topic. For now, I just grab a GUID for each server-RBAC-assignment I want to make as part of my template to generate a unique name for the assignment by using "name": "[parameters('rbacGuids')[add(mul(copyIndex(),2),1)]]".

The next trick part of this section in the template is getting the ID of the principal created for the managed service identity of the respective server VM. This part of the template really gets hard to read, so I broke it up into multiple lines although you cannot do that in a real template:

    "properties": {
        "roleDefinitionId": "[variables('rbacContributorRole')]",
        "principalId": "[reference
        "scope": "[resourceGroup().id]"

The code is using the reference()-template-function to get the principal ID of the service principal created as managed identity. That principal is a child-object of the virtual machine, so we need to start with the resourceId() of the virtual machine and attach the identities section to it. Finally, the reference()-function requires an API version where we use the version for the managed identity provider from a variable "managedIdentityAPIVersion": "2015-08-31-PREVIEW" in the code.

Getting a Token for your MSI

Based on the requests from that specific customer project where we needed this functionality, I decided to use Go as a programming language. I am still not a GoLang-expert, so I took the opportunity to learn. Using MSIs always follows two major steps:

  • Acquire a token through the locally installed VM Extension.This happens by calling into http://localhost:<port-selected-in-MSI-extension> settings/oauth2/token endpoint which is offered by the MSI VM Extension.
  • Use that token in REST API calls to the Azure Resource ManagerThese are regular REST-calls with the HTTP Authorization header containing the bearer token retrieved earlier.

In my GoLang-based example, I have one module contained in the file msitoken.go which performs a REST-call against the local OAuth2 server offered by the VM Extension (note that this is an incomplete excerpt, for the full code look at the file msitoken.go on my GitHub repo):

// etc. ...

const msiTokenURL string = "http://localhost:%d/oauth2/token"
const resourceURL string = ""

// etc. ...

var myToken MsiToken

// Build a request to call the MSI Extension OAuth2 Service
// The request must contain the resource for which we request the token
finalRequestURL := fmt.Sprintf("%s?resource=%s", fmt.Sprintf(msiTokenURL, msiPort), url.QueryEscape(resourceURL))
req, err := http.NewRequest("GET", finalRequestURL, nil)
if err != nil {
    log.Printf("--- %s --- Failed creating http request --- %s", t.Format(time.RFC3339Nano), err)
    return myToken, "{ \"error\": \"failed creating http request object to request MSI token!\" }"

// Set the required header for the HTTP request
req.Header.Add("Metadata", "true")

// Create the HTTP client and call the instance metadata service
client := &http.Client{}
resp, err := client.Do(req);
if err != nil {
    t = time.Now()
    log.Printf("--- %s --- Failed calling MSI token service --- %s", t.Format(time.RFC3339Nano), err)
    return myToken, "{ \"error\": \"failed calling MSI token service!\" }"
// Complete reading the body
defer resp.Body.Close()

// Now return the instance metadata JSON or another error if the status code is not in 2xx range
if (resp.StatusCode >= 200) && (resp.StatusCode <= 299) {
    dec := json.NewDecoder(resp.Body)
    err := dec.Decode(&myToken)
    // etc. ...
// etc. ...

Two aspects are important:

  • First, you always need to add the “Metadata: true” header for the call. All other calls will be rejected!
  • Second, you need to add a query-string parameter to the request called resource=uri://to-your-resource-you-want-to-do-calls-to. In our case, this is always the Azure Resource Manager REST APIs resource

Once we have executed the call, we do have a valid token available. Note that we didn’t have to fiddle around or deal with any kinds of secrets which is super-convenient. The Azure MSI infrastructure is totally taking care of the required details and there is not even a possibility to get access to any kinds of secrets for Managed Identities.

Using the MSI Token

This is the rather simple part of the story because it’s no different to any other Azure REST API call performed with any other kind of Azure AD user/principal. Once you have the token, you just use it in the HTTP Authorization header to call into the Azure Resource Manager REST APIs and if permissions are set up as previously outlined when I wrote about RBAC, all should go well.

The following snippets are parts of the GoLang Source file mypeers.go

const (
    environmentNameSubscription string = "SUBSCRIPTION_ID"
    environmentNameResourceGroup string = "RESOURCE_GROUP"

    restAPIEndpoint string =

    vmRelativeEndpoint string =

    authorizationHeader string = "%s %s"

func GetMyPeerVirtualMachines(msiToken MsiToken) (vms string, errOut string) {
    // etc. ...
    subID := os.Getenv(environmentNameSubscription)
    resGroup := os.Getenv(environmentNameResourceGroup)
    // etc. ...

    // Create the final endpoint URLs to call into the Azure Resource Manager VM REST API
    finalURL := fmt.Sprintf(restAPIEndpoint, 
                              subID, resGroup, vmRelativeEndpoint)
    finalAuthHeader := fmt.Sprintf(authorizationHeader,
                              msiToken.TokenType, msiToken.AccessToken)

    // Build a request to call the instance Azure in-VM metadata service
    req, err := http.NewRequest("GET", finalURL, nil)
    if err != nil {
        // etc. ...
    req.Header.Add("Authorization", finalAuthHeader)

    // Create the HTTP client and call the instance metadata service
    client := &http.Client{}
    resp, err := client.Do(req);
    if err != nil {
        // etc. ...
    // Complete reading the body
    defer resp.Body.Close()

    // Now return the raw VM JSON or another error if the status code is not in 2xx range
    if (resp.StatusCode >= 200) && (resp.StatusCode <= 299) {
        bodyContent, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            // etc. ...
        // etc. ...
        return string(bodyContent), ""

    // etc. ...

    return "", fmt.Sprintf("{ \"error\": \"Azure Resource Manager REST API call returned non-OK status code: %d \" }", resp.StatusCode)

This code is super-simple and just retrieves all other servers in the same resource group. It assumes, that the resource group and the subscription ID are both set as environment variables before the GO-application is started. This should give you an idea, how a server in a resource group could find other servers and get their private IP addresses to automatically configure components such as e.g. keepalived during an automated post provisioning step or something similar.

The Instance Metadata Service

The MSI and Azure ARM REST API calls can help retrieving details about peers or performing more complex management operations incl. creating or updating resources depending on the permissions given to a particular MSI. But for retrieving information details about itself, a VM does not necessarily need to go through MSI and ARM REST APIs since there’s a way simpler approach if it’s just about retrieving details about the VM itself.

For a few months, Azure makes an in-VM instance metadata service available which can be called from within the VM, only, but without additional authentication requirements. The documentation about the instance metadata service shows, how-to retrieve the data with simple tools such as curl. Again, the important thing is to include the metadata header as with the MSI token service, before.

In this end-2-end sample, I show, how to call the in-VM instance metadata service from a GoLang application. Again, I just show the mechanics, no concrete scenario for this post, but it should equip you with being able to implement scenarios such as the ones I’ve explained several times throughout the post. And I plan for subsequent blog-posts making use of these mechanics for a real scenario implementation. Below again an excerpt of the GoLang-code that retrieves instance metadata, for the full code please review metadata.go:

const instanceMetaDataURL string =

/*GetInstanceMetadata ()
 *Calls the Azure in-VM Instance Metadata service and returns the results to the caller*/
func GetInstanceMetadata() string {
    // etc. ...

    // Build a request to call the instance Azure in-VM metadata service
    req, err := http.NewRequest("GET", instanceMetaDataURL, nil)
    if err != nil {
        // etc. ...

    // Set the required header for the HTTP request
    req.Header.Add("Metadata", "true")

    // Create the HTTP client and call the instance metadata service
    client := &http.Client{}
    resp, err := client.Do(req);
    if err != nil {
        // etc. ...
    // Complete reading the body
    defer resp.Body.Close()

    if (resp.StatusCode >= 200) && (resp.StatusCode <= 299) {
        bodyContent, err := ioutil.ReadAll(resp.Body)
        // etc. ...
        return string(bodyContent)
    // etc. ...
    return fmt.Sprintf("{ \"error\": \"instance meta data service returned non-OK status code: %q \" }", resp.StatusCode)

The Main Go-Application

Before putting it all together, let’s have a quick look at the main GoLang application so that you get a sense, where those previous pieces of code are called from. The main application is fairly simple, it bootstraps a GoLang HTTP server and configures some routes for the HTTP-handlers (full source in main.go).

package main

import (

var myRoutes = map[string]func(http.ResponseWriter, *http.Request){
        "/": Index,
        "/meta": MyMeta,
        "/servers": MyPeers}

func main() {
    router := mux.NewRouter().StrictSlash(true);
    for key, value := range myRoutes {
        router.HandleFunc(key, value);
    log.Fatal(http.ListenAndServe(":8080", router))

The handlers.go then contains the functions which are referred to in the array myRoutes defined in the source code above. These are the actual functions called when the respective route URLs are called:

/*Index (w, r)
 *Returns with a list of available functions for this simple API*/
 func Index(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintln(w, "Welcome!");

/*MyMeta (w, r)
 *Returns instance metadata retrieved through the in-VM instance metadata service of the VM*/
func MyMeta(w http.ResponseWriter, r *http.Request) {
    metaDataJSON := GetInstanceMetadata()
    fmt.Fprintf(w, metaDataJSON)

/*MyPeers (w, r)
 *Uses the MSI to get a token and list all the other servers available in the resource group*/
func MyPeers(w http.ResponseWriter, r *http.Request) {
    token, err := GetMsiToken(50342)
    if err != "" {
        fmt.Fprint(w, err)
    } else {
        peerVms, err := GetMyPeerVirtualMachines(token)
        if err != "" {
            fmt.Fprint(w, err)
        } else {
            fmt.Fprint(w, peerVms)

Putting it all together

To make exploring this as easy as possible for you, the ARM templates and scripts I provide as part of this solution are setting up the entire environment automatically. To recall, here’s the screen shot of the entire environment from the Azure Network Watcher, again:

Network Watcher Topology

The ARM template sets up the Network, Virtual Machines, Network Security Groups etc. and for making it simple to explore the responses of the different servers without SSHing into the VMs, I also added a Load Balancer that exposes the GoLang application via Port-Mapping to each of the servers on the public load balancer. That means, you can just perform an http-request against the public load balancer with a port that maps to the server for which you would like to see the responses for. A few examples:

Of course, you can also SSH into the Jump-Box set up as part of this deployment and explore everything from the inside. Essentially, what I do is the following as part of the ARM template deployment to automate the setup of the GoLang application:

  • The ARM-template contains a custom script extension that runs on each of the servers to build the Go-application and generate a shell-script that registers the GoLang REST-API I’ve explained above as a service daemon.
  • The Service Daemon script which is generated as part of the server setup and copied to /etc/init.d/ sets the Subscription ID and the target resource group as an environment variable before launching the GoLang Application.

For making the process simple and easy to follow, I use a template for the init.d-script that gets generated with the custom script extension. This script is also on my github repository called

# Provides:          msiandmeta
# Required-Start:    $local_fs $network $named $time $syslog
# Required-Stop:     $local_fs $network $named $time $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: GoLang App using Azure MSI and Metadata
# Description:       Runs a Go Application which is a web server that demonstrates usage of Managed Service Identities and in-VM Instance Metadata



# Starts the simple GO REST service
start() {
    # Needed by the GO App to access subscription and resource group, correctly

    # Check if the service runs by looking at it's Process ID and Log Files
    if [ -f $processIDFilename ] && [ "`ps | grep -w $(cat $processIDFilename)`" ]; then
        echo 'Service already running' >&2
        return 1
    echo 'Starting service...' >&2
    su -c "start-stop-daemon -SbmCv -x /usr/bin/nohup -p \"$processIDFilename\" -d \"$appPath\" -- \"./$appName\" > \"$logFilename\"" $appUserName
    echo 'Service started' >&2

# Stops the simple GO REST service
stop() {
    if [ ! -f $processIDFilename ] && [ ! "`ps | grep -w $(cat $processIDFilename)`" ]; then
        echo "Service not running" >&2
        return 1
    echo "Stopping Service..." >&2
    start-stop-daemon -K -p "$processIDFilename"
    rm -f "$processIDFilename"
    echo "Service stopped!" >&2

# Main script execution

case $1 in




      echo "Usage: $0 start|stop|restart"

In this script, you can see tokens such as __SUBSCRIPTION_ID__. These tokens are replaced by the script that’s executed at provisioning time for each of the servers through the custom script extension definition in the main ARM template for the entire solution:

    "apiVersion": "[variables('computeAPIVersion')]",
    "type": "Microsoft.Compute/virtualMachines/extensions",
    "name": "[concat(variables('serverVmNamePrefix'),copyIndex(),'/SetupScriptExtension')]",
    "location": "[parameters('location')]",
    "copy": {
        "name": "serverVmSetupExtensionCopy",
        "count": "[parameters('serverCount')]"
    "dependsOn": [
        "[resourceId('Microsoft.Compute/virtualMachines',concat(variables('serverVmNamePrefix'), copyIndex()))]",
        "[concat('Microsoft.Compute/virtualMachines/', concat(variables('serverVmNamePrefix'),copyIndex()),'/extensions/IdentityExtension')]"
    "properties": {
        "publisher": "Microsoft.Azure.Extensions",
        "type": "CustomScript",
        "typeHandlerVersion": "2.0",
        "autoUpgradeMinorVersion": true,
        "settings": {
            "fileUris": [
        "protectedSettings": {
            "commandToExecute": "[concat('./ -a ', parameters('adminUsername'), ' -s ', subscription().subscriptionId, ' -r ', resourceGroup().name)]"

The script that’s invoked through the custom script extension above is also on my GitHub repository and generates the final init.d-script for the service registration based on the input parameters. These input-parameters are exactly the subscription-name, the resource group name and the user under which the daemon should run. Here’s an excerpt of the that builds the GoLang App and generates the target init.d-script:

# Next compile the Go Application
mkdir ./app
mv *.go ./app

export PATH="$PATH:/usr/local/go/bin"
export GOPATH="`realpath ./`/app"
export GOBIN="$GOPATH/bin"
go get ./app
go build -o msitests ./app

sudo mkdir /usr/local/msiandmeta
sudo cp ./msitests /usr/local/msiandmeta
sudo chown -R $adminName:$adminName /usr/local/msiandmeta

# Configure apache2 to use the Go application as a CGI script
cat ./ \
| awk -v USER="$adminName" '{gsub("__USER__", USER)}1' \
| awk -v APP_NAME="msitests" '{gsub("__APP_NAME__", APP_NAME)}1' \
| awk -v APP_PATH="/usr/local/msiandmeta" '{gsub("__APP_PATH__", APP_PATH)}1' \
| awk -v SUBS="$subscriptionId" '{gsub("__SUBSCRIPTION_ID__", SUBS)}1' \
| awk -v RGROUP="$resGroup" '{gsub("__RESOURCE_GROUP__", RGROUP)}1' \

# Now make sure the script is handled by the system for starting/stopping the service
sudo cp ./ /etc/init.d
sudo chmod +x /etc/init.d/
sudo update-rc.d defaults

With that, the GoLang-application that accesses the ARM REST APIs through the MSI and the instance metadata service as part of this sample should run, automatically, and always find the correct subscription ID and resource group name as part of the environment variables since they’re set by the init.d-script generated from the template through this way!

Testing the environment

Once you have deployed the ARM template into your subscription, you should be able to call the GoLang-application I’ve explained above that demonstrates the mechanics of the instance metadata service and the Managed Service Identity in action through the Load-Balancer using the NAT-ports for each server. The reason for mapping each server through a port to the outside world was for demo-purposes and to make it as easy as possible for you to examine the different responses of the different servers without SSHing into any machine. The following screen shot shows this in action by comparing different responses from different servers.

Running the app in action

Of course, in the real world you would not expose these things, directly, but rather use them from within your applications!! For this sample and for enabling you to ramp up with details, quickly, it should be helpful, hopefully!

Final Words

Managed Service Identities and the in-VM Instance Metadata Service are extremly helful and it was long overdue to have these kind of great capabilities. Both services allow you to implement complex scenarios such as:

  • Implementing licensing and IP-protection strategies based on the in-VM instance metadata service.
  • Script automated configurations of clustered environments by being able to call into Azure Resource Manager REST APIs from within Virtual Machines without the need of managing secrets for Service Principals.
  • many, many more and similar scenarios.

With both services availabe on Azure, my previous blog-post becomes obsolete for this specific scenario, although there might still be many reasons for leveraging service principals for other scenarios, of course (so it might still be a good source for learning details about service principals in Azure AD, in general). But the specific scenario outlined in both, that previous post and this one, can be implemented way better with Managed Service Identities and the in-VM Instance Metadata Service combined!

I hope you enjoyed reading this and it was valuable for you. We went through something that leverages these mechanics in a very similar way for a concrete scenario with one of my customers… my plan is to post about a concrete scenario that leverages these mechanics as one of my next blogging activities.

Stay Tuned!

Azure VMs – SQL Server AlwaysOn Setup across multiple Data Centers fully automated (Classic Service Management)

Last December I started working with two of my peers, Max Knor and Igor Pagliai, with a partner in Madrid on implementing a Cross-Data Center SQL Server AlwaysOn availability group setup for a financial services solution which is supposed to be provided to 1000s of banks across the world running in Azure. Igor posted about our setup experience which we partially automated with Azure PowerShell and Windows PowerShell – see here.

At the moment the partner’s software still requires SQL Server in VMs as opposed to Azure SQL Databases because of some legacy functions they use from full SQL Server – therefore this decision.

One of the bold goals was to fully enable the partner and their customers to embrace DevOps and continuous delivery across multiple environments. For this purpose we wanted to FULLY AUTOMATE the setup of their application together with an entire cross-data-center SQL Server AlwaysOn environment as outlined in the following picture:

In December we did a one-week hackfest to start these efforts. We successfully did setup the environment, but partially automated, only. Over the past weeks we went through the final effort to fully automate the process. I’ve published the result on my github repository here:

Deployment Scripts Sample Published on my GitHub Repository

Note: Not Azure Resource Groups, yet

Since Azure Resource Manager v2 which would allow us to dramatically improve the performance and reduce the complexity of the basic Azure VM environment setup is still in Preview, we were forced to use traditional Azure Service Management.

But about 50%-60% of the efforts we have done are re-usable given the way we built up the scripts. E.g. all the database setup and custom service account setup which is primarily built on-top of Azure Custom Script VM Extensions can be re-used after the basic VM setup is completed. We are planning to create a next version of the scripts that does the fundamental setup using Azure Resource Groups because we clearly see the advantages.

Basic Architecture of the Scripts

Essentially the scripts are structured into the following main parts which you would need to touch if you want to leverage them or understand them for learning purposes as shown below:

  • Prep-ProvisionMachine.ps1 (prepare deployment machine)
    A basic script you should execute on a machine before starting first automated deployments. It installs certificates for encrypting passwords used as parameters to Custom Script VM Extensions as well as copying the basic PowerShell modules into the local PowerShell module directories so they can be found.
  • Main-ProvisionConfig.psd1 (primary configuration)
    A nice little trick by Max which is nice to provide at least some sort of declarative configuration was to build a separate script file that creates an object-tree with all the configuration data typically used for building up the cluster. It contains cluster configuration settings, node configuration settings and default subscription selection data.
  • Main-ProvisionCrossRegionAlwaysOn.ps1 (main script for automation)
    This is the main deployment script. It performs all the actions to setup the entire cross-region cluster including the following setups:
    • Setup your subscription if requested
    • Setup storage accounts if they do not exist, yet
    • Upload scripts required for setup inside of the VMs to storage
    • Setup cloud services if requested
    • Create Virtual Networks in both regions (Primary/Secondary)
    • Connect the Virtual Networks by creating VPN Gateways
    • Set the primary AD Forest VM and the Forest inside of the VM
    • Setup secondary AD DC VMs including installing AD
    • Provision SQL Server VMs
    • Setup the Internal Load Balancer for the AlwaysOn Listener
    • Configure all SQL VMs to have AlwaysOn enabled
    • Configure the Primary AlwaysOn node with the initial database setup
    • Join secondary AlwaysOn nodes and restore databases for sync
    • Configure a file-share based witness in the cluster
  • VmSetupScripts Folder
    This is essentially a folder with a series of PowerShell scripts that do perform single installation/configuration steps inside of the Virtual Machines. They are downloaded with a Custom Script VM Extension into the Virtual Machines and executed through VM Extensions, as well.

Executing the Script and Looking at the Results

Before executing the main command make sure to execute .\Prep-ProvisionMachine.ps1 to setup certificates or import the default certificate which I provide as part of the sample. If you plan to seriously use those scripts, please create your own certificate. Prep-ProvisionMachine.ps1 provides you with that capability assuming you have makecert.exe somewhere on your machines installed (please check Util-CertsPasswords for the paths in which I look for makecert.exe).

# To install a new certificate

# To install a new certificate (overwriting existing ones with same Subject Names)
.\Prep-ProvisionMachine.ps1 -overwriteExistingCerts

# Or to install the sample certificate I deliver as part of the sample:
.\Prep-ProvisionMachine.ps1 -importDefaultCertificate

Then everything should be fine to execute the main script. If you don’t specify the certificate-related parameters as shown below I assume you use my sample default certificate I include in the repository to encrypt secrets pushed into VM Custom Script Extensions.

# Enter the Domain Admin Credentials
$domainCreds = Get-Credential

# Perform the main provisioning

.\Main-ProvisionCrossRegionAlwaysOn.ps1 -SetupNetwork -SetupADDCForest -SetupSecondaryADDCs -SetupSQLVMs -SetupSQLAG -UploadSetupScripts -ServiceName "mszsqlagustest" -StorageAccountNamePrimaryRegion "mszsqlagusprim" -StorageAccountNameSecondaryRegion "mszsqlagussec" -RegionPrimary "East US" -RegionSecondary "East US 2" -DomainAdminCreds $domainCreds -DomainName "msztest.local" -DomainNameShort "msztest" -Verbose

After executing a main script command such as the following, you will get 5 VMs in the primary region and 2 VMs in the secondary region acting as a manual failover. 

The following image shows several aspects in action such as the failover cluster resources which are part of the AlwaysOn availability group as well as SQL Server Management Studio accessing the AlwaysOn Availability Group Listener as well as SQL Nodes, directly. Click on the image to enlarge it and see all details.

Please note that the failover in the secondary region needs to happen MANUALLY by executing either a planned manual failover or a forced manual failover as documented on MSDN. Failover in the primary region (from the first to the second SQL Server) is configured to happen automatically.

In addition on Azure it means to take the IP cluster resource for the secondary region online which by default is offline in the cluster setup as you can see on the previous image.

Customizing the Parts you Should Customize

As you can see in the image above, the script creates sample databases which it sets up for the AlwaysOn Availability Group to be synchronized across two nodes in the main. This happens based on *.sql scripts you can add to your configuration. To customize the SQL Scripts and Databases affected, you need to perform the following steps:

  • Create *.sql scripts with T-SQL code that creates the databases you want to create as part of your AlwaysOn Availability Group.
  • Copy the *.sql Files into the VmSetupScripts directory BEFORE starting the execution of the main script. That leads to have them included into the package that gets pushed to the SQL Server VMs
  • Open up the main configuration file and customize the database list based on the databases created with your SQL scripts as well as the list of SQL Scripts that should be pushed into osql.exe/sqlcmd.exe as part of the setup process for creating the databases.
  • Also don’t forget to customize the subscription name if you plan to not override it through the script-parameters (as it happens with the example above).

The following image shows those configuration settings highlighted (in our newly released Visual Studio Code editor which also has basic support for PowerShell):

Fundamental Challenges

The main script can primarily be seen as a PowerShell workflow (we didn’t have the time to really implement it as a Workflow, but that would be a logical next step after applying Azure Resource Groups).

It creates one set of Azure VMs after another and joins them to the virtual networks it has created before. It then executes scripts on the Virtual Machines locally which are doing the setup by using Azure VM Custom Script Extensions. Although custom script extensions are cool, you have two main challenges with them for which the overall package I published provides re-usable solutions:

  • Passing “Secrets” as Parameters to VM Custom Script Extensions such as passwords or storage account keys in a more secure way as opposed to clear-text.
  • Running Scripts under a Domain User Account as part of Custom Script Extensions that require full process level access to the target VMs and Domains (which means PowerShell Remoting does not work in most cases even with CredSSP enabled … such as for Cluster setups).

For these two purposes the overall script package ships with some additional PowerShell Modules I have written, e.g. based on a blog-post from my colleague Haishi Bai here.

Running Azure VM Custom Script Extensions under a different User

Util-PowerShellRunAs.psm1 includes a function called Invoke-PoSHRunAs which allows you to run a target script with its parameters under a different user account as part of a custom script VM Extension. A basic invocation of that script looks as follows:

$scriptName = [System.IO.Path]::Combine($scriptsBaseDirectory, "Sql-Basic01-SqlBasic.ps1") 
Write-Host "Calling into $scriptName"
try {
    $arguments = "-domainNameShort $domainNameShort " + `
                 "-domainNameLong $domainNameLong " +  `
                 "-domainAdminUser $usrDom " +  `
                 "-dataDriveLetter $dataDriveLetter " +  `
                 "-dataDirectoryName $dataDirectoryName " +  `
                 "-logDirectoryName $logDirectoryName " +  `
                 "-backupDirectoryName $backupDirectoryName " 
    Invoke-PoSHRunAs -FileName $scriptName -Arguments $arguments -Credential $credsLocal -Verbose:($IsVerbosePresent) -LogPath ".\LogFiles" -NeedsToRunAsProcess
} catch {
    Write-Error $_.Exception.Message
    Write-Error $_.Exception.ItemName
    Write-Error ("Failed executing script " + $scriptName + "! Stopping Execution!")

This function allows you to either run through PowerShell remoting or in a separate process. Many setup steps of the environment we setup do actually not work through PowerShell remoting because they rely on impersonation/delegation or do PowerShell Remoting on their own which imposes several limitations.

Therefore the second option this script provides is executing as a full-blown process. Since Custom Script Extensions to run as local system, it is nevertheless not as simple as just doing a Start-Process with credentials being passed in (or a System.Diagnostics.Process.Start() with different credentials). Local System does not have those permissions, unfortunately. So the work-around is to use the Windows Task Scheduler. For such cases the function performs the following actions:

  • Schedule a task in the Windows Task Scheduler with the credentials needed to run the process as.
  • Manually start the task using PowerShell cmdLets
    • (Start-ScheduledTask -TaskName $taskName)
  • Wait for the task to be finished from running
  • Look at the exit code
  • Throw an Exception if the exit code is non-zero, otherwise assume success
  • Delete the task again from the task scheduler

This “work-around” helped us to completely execute the entire setup steps successfully. We were also discussing with the engineers building the SQL AlwaysOn single-data-center Azure Resource Group template that is available for single-data-center deployments in the new Azure Portal, today. They are indeed doing the same thing, details are just a bit different.

Encrypting Secrets Passed to Custom Script VM Extensions

Sometimes we were just required to pass secret information to custom script extensions such as storage account keys. Since Azure VM Custom Script Extensions are logged very verbose, it would be a piece of cake to get to that secret information by doing a Get-AzureVM and looking at the ResourceExtensionStatusList member which contains the status and detailed call information for all VM Extensions.

Therefore we wanted to encrypt secrets as they are passed to Azure VM Extensions. The basic (yet not perfect) approach works based on some guidance from a blog post from Haishi Bai as mentioned earlier.

I’ve essentially written another PowerShell module (Util-CertsPasswords) which can perform the following actions:

  • Create a self-signed certificate as per guidance on MSDN for Azure.
  • Encrypt Passwords using such a certificate and return a base64-encoded, encrypted version.
  • Decrypt Passwords using such a certificate and return the clear-text password.

In our overall workflow all secrets including passwords and storage account keys which are passed to VM Custom Script Extensions as parameters are passed as encrypted values using this module.

Using Azure CmdLets we make sure that the certificates are published with the VM as part of our main provisioning script as per Michael Washams guidance from the Azure Product group.

Every script that gets executed as part of a custom VM Script Extension receives an encrypted password and uses the module I’ve written to decrypt it and use it for the remaining script such as follows:

# Import the module that allows running PowerShell scripts easily as different user
Import-Module .\Util-PowerShellRunAs.psm1 -Force
Import-Module .\Util-CertsPasswords.psm1 -Force

# Decrypt encrypted passwords using the passed certificate
Write-Verbose "Decrypting Password with Password Utility Module..."
$localAdminPwd = Get-DecryptedPassword -certName $certNamePwdEnc -encryptedBase64Password $localAdminPwdEnc 
$domainAdminPwd = Get-DecryptedPassword -certName $certNamePwdEnc -encryptedBase64Password $domainAdminPwdEnc 
Write-Verbose "Successfully decrypted VM Extension passed password"

The main provisioning script encrypts the passwords and secrets using that very same module before being passed into VM Custom Script Extensions as follows:

$vmExtParamStorageAccountKeyEnc = `
Get-EncryptedPassword -certName $certNameForPwdEncryption `             -passwordToEncrypt ($StorageAccountPrimaryRegionKey.Primary)

That way we at least make sure that no un-encrypted secret is visible in the Azure VM Custom Script Extension logs that can easily be retrieved with the Azure Service Management API PowerShell CmdLets.

Final Words and More…

As I said, there are lots of other re-usable parts in the package I’ve just published on my Github Repository which even can be used to apply further setup and configuration steps on VM environments which have entirely been provisioned with Azure Resource Groups and Azure Resource Manager. A few examples:

  • Execute additional Custom Script VM Extensions on running VMs.
  • Wait for Custom Script VM Extensions to complete on running VMs.
  • A ready-to-use PowerShell function that makes it easier to Remote PowerShell into provisioned VMs.

We also make use of an AzureNetworking PowerShell module published on the Technet Gallery. But note that we also made some bug-fixes in that module (such as dealing with “totally empty VNET configuration XML files”).

Generally the experience of building these ~2500 lines of PowerShell code was super-hard but a great learning experience. I am really keen to publish the follow-up post on this that demonstrates how much easier Azure Resource Group templates to make such a complex setup.

Also I do hope that we will have such a multi-data-center template in the default gallery soon since it is highly valuable for all partners and customers that do need to provide high-availability across multiple data centers using SQL Server Virtual Machines. In the meantime we will try to provide a sample based on this work above as soon as we can have time/resources for implementation.

Finally – thanks to Max Knor and Igor Pagliai – without their help we would not have achieved these goals at this level of completeness!

Azure Batch – (Highly Scalable) Batch Processing with Microsoft Azure (and a Successor to GeRes2)

Batch processing is something nearly everyone I have been working with is doing in some or the other way on Microsoft Azure. Last year some colleagues and I did work with several global partners that had this requirement. Therefore since nothing at the scale of Azure Batch was available, yet, we created GeRes2 as an open source project. GeRes2 covers batch processing in a simple, pragmatic yet scalable way (at full scale of Web/Worker roles as opposed to the limited scale of WebJobs SDK in WebSites). But now, the times when you need GeRes2 are over. Instead you definitely should consider Azure Batch, directly.

What is Azure Batch?

Azure Batch is a completely managed service, you could see it as “batch processing as a service”.

It can be used for simple batch processing up to High-Performance-Compute (HPC) types of workloads. As a managed service, Azure Batch is fully operated by Microsoft and made accessible through an HTTP REST API. While for simpler requirements you might want to use WebJobs SDK, if you really need to scale beyond the options provided by WebSites and do not want to manage infrastructure (e.g. such as it is the case with Windows Server HPC Pack), Azure Batch is your best friend.

A sample on Azure Batch to help you Getting Started!

As a first introduction and to help you get started, I thought to publish a sample inspired by a session from one of the Azure Batch product managers, Mark Scurrell, at TechEd EMEA 2014: an OCR image recognition based on the Tesseract OCR Engine. The whole sample is available on my repository:

The following sketch outlines the flow of the sample which I’ve built to help you get started. Note that I also include a PowerShell Script that sets up the environment in your Azure Subscription (please setup Azure PowerShell correctly, before). It creates all required Azure service accounts (storage, batch), updates app.config configurations, builds the sample and finally uploads sample data as it is needed for testing the scenario right away.

As you can see, the overall solution consists of a client that creates compute pools, submits jobs to that compute pool that use Tesseract for an OCR recognition on-top of PNG-based images stored in Azure Blob Storage and then makes the results available in Azure Blob storage, as well. Although there are different alternative ways with Azure Batch, for this first sample I also wrote a little console application that downloads the source-images from Azure Blob to the task virtual machines (TVM), processes them using tesseract.exe and then uploads the results back to BLOB-storage.

Understanding the fundamental Azure Batch Concepts

Before you can get started, you should understand the fundamental concepts of Azure Batch. Let’s get started with the following terms:

  • Azure Batch Account
    An account is a management unit used to group batch services and batch apps together in a single unit with security access keys.
  • Azure Batch REST API
    For each account, Batch is made available through an HTTP REST API. For .NET developers the team ships an SDK, already. Other languages will follow (or are available after I’ve published this article, already).
  • Azure Batch Apps
    Batch can be used in two flavors, through the low-level REST-API which is more complex but provides you with a bigger set of options and control or through a more managed experience with a management portal and a light-weight API called Batch Apps.
  • Compute Pools
    These are groups of compute nodes used for executing work. Pools can contain many compute nodes and can be configured with auto scaling rules so they add/remove resources based on the load on a pool.
  • Task Virtual Machines (TVM)
    A TVM is a single compute node which is part of a compute pool. Essentially behind the scenes TVMs are worker roles since Azure Batch by itself is implemented with Web/Worker roles. For you as developer, just think of them as scalable, stateless virtual machines.
  • Work Items
    Work Items are used to describe classes of units of works (aka jobs) and configure scheduling properties of jobs (e.g. execute once, regular time-controlled execution etc.). It also allows specifying the compute pools on which jobs of a work item are executed.
  • Jobs
    A job is a unit of work. It describes a set of concrete execution items called tasks. Each task gets executed on a TVM part of a pool that is tied to the work item the job belongs to.
  • Tasks
    A task is a single execution step of a job. Essentially a task is an executable that you need to provide and specify as part of the scheduling process that will be executed on TVMs.

Creating an Azure Batch Account with Azure PowerShell

The first thing you need to do is creating an Azure Batch Account. Note that at the time of writing this article, Batch was still in Preview. Therefore make sure you activate it for your subscription, first. The PowerShell script I provide for setting up the sample does that for you. Note that Azure Batch can only be managed as part of the new Azure Resource Manager inside of a Resource Group. Therefore in an Azure PowerShell CmdLet you first need to switch the Azure PowerShell mode to AzureResourceManager:

   1: Switch-AzureMode -Name AzureResourceManager

   2: New-AzureResourceGroup -Name $batchSampleResourceGroupName -Location $regionName

   3: New-AzureBatchAccount -AccountName $azureBatchAccountName `

   4:                       -ResourceGroupName $batchSampleResourceGroupName `

   5:                       -Location $regionName

Below a screen-shot of the script I wrote to setup things in action. That screen shot shows you, how-to call the script to setup the environment required for the demo:

Azure Batch NuGet Package

Now that we have an account, you can start developing. In Visual Studio you need to use a NuGet package called Azure Batch. Please make sure to use the core Azure Batch package and not the Batch Apps package in case you want to use the full-blown, low-level Batch APIs.

Creating Compute Pools using the API

The first thing you need to do is creating some compute pools with task virtual machines. The .NET SDK wraps the REST API for such purposes in a very convenient way. After having set-up BatchCredentials and a BatchClient you can open various managers that encapsulate certain management operations, e.g. a Compute Pool Manager as shown below.

   1: using (var pm = batchClient.OpenPoolManager())

   2: {

   3:     var pools = pm.ListPools().ToList();

   4:     var poolExists = (pools.Select(p => p.Name).Contains(PoolName));

   5:     if (!poolExists)

   6:     {

   7:         var newPool = pm.CreatePool

   8:             (PoolName, "3", "small", 5

   9:             );

  10:         newPool.StartTask = new StartTask

  11:         {

  12:             ResourceFiles = binaryResourceFiles,

  13:             CommandLine = "cmd /c CopyFiles.cmd",

  14:             WaitForSuccess = true

  15:         };

  16:         newPool.CommitAsync().Wait();

  17:     }

  18: }

One interesting aspect of the compute pool creation above is the definition of a startup task. This startup task in case of the sample defines a list of files that should be downloaded from Azure BLOB storage to the TVMs as part of the bootstrapping process. That list of files is prepared earlier in the code as shown below:

   1: var binaryResourceFiles = new List<IResourceFile>();

   2: Console.WriteLine("Get list of 'resource files' required for execution from BLOB storage...");

   3: foreach (var resFile in blobTesseractContainer.ListBlobs(useFlatBlobListing: true))

   4: {

   5:     var sharedAccessSig = CreateSharedAccessSignature(blobTesseractContainer, resFile);

   6:     var fullUriString = resFile.Uri.ToString();

   7:     var relativeUriString = fullUriString.Replace(blobTesseractContainer.Uri + "/", "");


   9:     Console.WriteLine("- {0} ", relativeUriString);


  11:     binaryResourceFiles.Add(

  12:         new ResourceFile

  13:             (

  14:             fullUriString + sharedAccessSig,

  15:             relativeUriString.Replace("/", @"\")

  16:             )

  17:         );

  18: }

Note that even if the blob container is publicly accessible I had to use shared access signatures to allow Azure Batch downloading those files. These files will be placed in a directory dedicated to the startup task (meaning actual tasks executed later don’t have access to this directory). Therefore one of the files downloaded is a batch-script which is then executed as part of the startup procedure. This batch script copies the file from the startup task working directory to the shared-directory to which all tasks do have access to as shown below:

   1: @echo off

   2: echo "List Files for diagnostics..."

   3: echo %WATASK_TVM_ROOT_DIR%

   4: dir .\ /s

   5: echo.

   6: echo Moving BatchTesseractWrapper files to shared task directory

   7: robocopy /MIR .\ %WATASK_TVM_ROOT_DIR%\shared

   8: if "%errorlevel%" LEQ "4" (

   9:    SET errorlevel=0

  10: )

One little hint here: Azure Batch considers a task (incl. the startup task) to be successful when it returns an exit code of 0. Since I am using robocopy.exe in my script I need to consider, that robocopy has several non-0 success exit codes. Therefore I map those to the exit code 0.

After my startup task finally completed, I do have all the tesseract-binaries as well as the tesseract wrapper executable I’ve written (which downloads files from BLOB, processes them with tesseract.exe and then uploads the result back to BLOB) in the shared task working directory.

Note that all of these working directories for tasks on TVMs are placed under a sub-directory of the task root directory exposed through the WATASK_TVM_ROOT_DIR environment variable. These directories are:

  • %WATASK_TVM_ROOT_DIR%\shared
    is a shared directory to which all tasks executed on the TVM do have read and execute permissions.
  • %WATASK_TVM_ROOT_DIR%\startup
    is a directory dedicated to the startup tasks specified as part of the compute pool creation. No other tasks do have any access to this directory.
  • %WATASK_TVM_ROOT_DIR%\tasks\<workitemname>\<jobname>\<taskname>
    is the directory dedicated for task execution whereas every work-item, job and task gets his own directory in this working directory.

A really cool tool for confirming that everything in our startup tasks did work is the Azure Batch explorer. It is also cool for exploring the true, detailed directory structure which I’ve outlined above. I might write about this tool in a subsequent blog post and will continue focusing on the code in this one.

Scheduling Jobs for Execution

Once the compute pool runs and all the TVMs are prepared with the tesseract binaries through the startup task, we can start scheduling jobs. This happens with the WorkItem manager and by creating a WorkItem with a Job and adding tasks to that job. In my sample I create one task for each file I want to OCR-recognize which I’ve previously uploaded to BLOB storage.

Note:the PowerShell setup script I do provide uploads some sample data I’ve included in the git-repository so that you can get started right away.

   1: using (var wiMgr = batchClient.OpenWorkItemManager())

   2: {

   3:     var workItemName = string.Format("ocr-{0}", DateTime.UtcNow.Ticks);

   4:     var ocrWorkItem = wiMgr.CreateWorkItem(workItemName);

   5:     ocrWorkItem.JobExecutionEnvironment =

   6:         new JobExecutionEnvironment

   7:         {

   8:             PoolName = PoolName

   9:         };

  10:     ocrWorkItem.CommitAsync().Wait();


  12:     var taskNr = 0;

  13:     const string defaultJobName = "job-0000000001";

  14:     var job = wiMgr.GetJob(workItemName, defaultJobName);


  16:     foreach (var ocrFile in filesToProcess)

  17:     {

  18:         var taskName = string.Format("task_no_{0}", taskNr++);

  19:         var taskCmd =

  20:             string.Format(

  21:                 "cmd /c %WATASK_TVM_ROOT_DIR%\\shared\\BatchTesseractWrapper.exe \"{0}\" \"{1}\"",

  22:                 ocrFile.BlobSource,

  23:                 Path.GetFileNameWithoutExtension(ocrFile.FilePath));


  25:         ICloudTask cloudTask = new CloudTask(taskName, taskCmd);


  27:         job.AddTask(cloudTask);

  28:     }

  29:     job.Commit();

The submission here happens with the WorkItemManager. Every WorkItem gets a default job (job-0000000001) which can be used immediately for adding tasks to be executed. This is exactly what is done in the code snippet above.

Note: since the job-creation might not be completed, yet, the call to wiMgr.GetJob() should be wrapped with some retry-logic.

For each file I’ve stored in BLOB storage I do create a task. That task ultimately just executes a command shell with the executable that downloads the source image to the TVM, calls tesseract.exe to OCR recognize the image and uploads the resulting text-file back to Azure Blob storage. Since this should be straight-forward for Azure-experienced developers, I leave it to you to look at the code on my github-repository.

Waiting for the tasks to complete and get results

This final step is optional, but in case your program needs to wait for the tasks to complete before doing something else, the code below is helpful.

   1: var toolBox = batchClient.OpenToolbox();

   2: var stateMonitor = toolBox.CreateTaskStateMonitor();

   3: var runningTasks = wiMgr.ListTasks(workItemName, defaultJobName);

   4: stateMonitor.WaitAll(runningTasks, TaskState.Completed, TimeSpan.FromMinutes(10));


   6: var tasksFinalResult = wiMgr.ListTasks(workItemName, defaultJobName);

   7: foreach (var t in tasksFinalResult)

   8: {

   9:     Console.WriteLine("- Task {0}: {1}, exit code {2}", t.Name, t.State,

  10:         t.ExecutionInformation.ExitCode);

  11: }

The Azure Batch SDK for .NET comes with a set of handy utility classes to allow you doing exactly that. Through the TaskStateMonitor utility class the code above waits till all tasks have completed their work, loads the most recent data from the Azure Batch REST API and displays the results.

Important here is that every task, whether succeeded or failed, will be in the completed-state except you terminated it before it completed through the REST API. It’s up to you to check if the task was successful by looking at the exit code of the executable for your task.

The actual results of the tasks should be visible now in your Azure BLOB storage account in a container called ocr-results as shown below.

Final Words

This article and the sample I published on GitHub should help you to understand the basic principles and how-to get started with Azure Batch. As one of the creators of GeRes2, which is also doing scalable execution of jobs on Worker Roles, I really experienced how much effort it is to build something like this on your own.

As a managed service, Azure Batch really does everything for you. You can focus on creating your tasks as well as scheduling your jobs/tasks instead of dealing with all the plumbing infrastructure (e.g. distributing work across compute nodes, compute node management, logging, auto scaling, task binary deployments etc.).

The only thing that you might miss as compared to GeRes2 is notifications via Service Bus and SignalR. But to be honest, when using GeRes2, those notifications did only work up to a maximum of 100 compute nodes since we did not scale out the message queues on Service Bus which we used for the notifications. Of course we could have done that, but based on the needs of those partners who used notifications with GeRes2 it was just not necessary to support > 100 instances while those which used GeRes2 with more than 100 instances did really not use notifications. And there’s still the option of polling the tasks’ status through the APIs made available by Azure Batch. But eventually we’ll work on something to show you, how-to get this remaining piece to Azure Batch, as well.

Azure Batch is really the premium service when it comes to highly scalable batch processing on Azure. It is easy to use and it saves you from a whole lot of plumbing work that needs to be done if you do batch processing manually. In a future blog-post I plan to write about my personal opinion and view on comparing Azure Batch to other batch processing options available on Azure such as WebJobs SDK or running HPC Pack in Windows Server VMs in Azure.

In the meantime, feel free to share some feedback or ask questions via Twitter (

Janet Moonshot & FreeRADIUS on Microsoft Azure – An important step for Researchers for being able to use Microsoft’s Public Cloud Platform

Over the past months I’ve spent some time working with Janet the UK’s National Research and Education Network. As well as managing the operation and development of the Janet network, Janet runs a number of services for educationand research including operating eduroam(UK), the UK part of a large, global network established between all sorts of research and education facilities and institutions.

In addition to eduroam and other services, one of the most important projects Janet is leading is the development and standardization of an open platform for authentication, authorization and trust management based on existing standards:

This platform is called Project Moonshot.

Simply put, Moonshot in conjunction with FreeRADIUS is an identity provider and a security token service. Nevertheless it is primarily based on the protocols mentioned above – EAP, RADIUS and GSS-API/SSPI/SASL instead of WS-Federation, OAuth or SAML-P (although one of the token formats supported by Moonshot are SAML tokens).

What I personally think is really cool about Moonshot!?

The really cool and practical thing by being built on the protocols above, is in my personal opinion, that those protocols are supported by almost all relevant operating system platforms deeply integrated since these standards are also used by Kerberos. For example Windows as an OS has SSPI deeply built into the logon-process of Windows which means with an SSPI-provider for Moonshot, the Windows logon itself can be sourced from a federation through a variety of trust-relationships instead of the OS or a direct domain controller by itself. That is one thing that is not possible with the commonly more widely known standards such as WS-Fed, OAuth or SAML-P since they’re all web-focused.

Why is Moonshot so important for Microsoft and Microsoft Azure?

Independent of what I think the advantages are of Moonshot, the most important part for Microsoft and Microsoft Azure is, that Janet is working with NRENs, academia and research across Europe and internationally to establish Moonshot as THE prime authentication mechanism for research communities, building up trust relationships between them and thus allowing federated authentication and authorization for research projects across the world.

In other words: if Microsoft Azure wants to play an important role in research in the future, Moonshot needs to be somehow supported in Azure as a platform. Through our partnership and work with Janet we achieved a first step for this over the past months, together!

Moonshot IdP Base Image on VMDepot…

Working together with Janet we managed to get a base-image prepared, tested and published on Microsoft Open Technologies VMDepot that can be used by anyone who wants to get connected to research communities through Moonshot Trust Routers and IdPs for federated security.

You can find this image here on VMDepot for getting started.

Although it seems like a simple thing to do, we had to undergo a few steps to get this far. Moonshot was required to be updated to support the Linux-distributions officially supported on Microsoft Azure. Furthermore we had to test if the semantics of the protocols used, especially EAP, do work well on Microsoft Azure. At least for single VM deployments we did this and the image above on VMDepot contains all the bits with which we’ve tested.

Of course that’s just the first step and we know we need to take some future steps such as making the deployments ready for multi-instance deployments for the sake of high availability and eventually also performance. Nevertheless, this is a great first step which was required and enables us to move forward.

The image by itself is based on Ubuntu Linux 12.10 LTS, it has the Moonshot and FreeRADIUS package repositories configured correctly and has other useful packages installed that are required or nice to have for Moonshot and FreeRADIUS (such as “screen” for example).

Using the Moonshot/FreeRADIUS VMDepot Image

Next I’d like to summarize how you can make use of the Moonshot VMDepot image. Note that most probably you should be involved in academia or research projects for this to be useful to you J. Of course you can also setup your own, single IdP using the image, but the full power gets unleashed when you become part of the Janet Trust Router network which is what I am focusing on in this blog post right now.

Let’s start with a few assumptions / prerequisites:

  • Assumption #1:
    Since the primary target group is academia and research which is very Linux-focused, I am assuming people who’re trying this will most probably try the steps below from a Linux-machine. Therefore I am only using tools that also work on Linux (or Mac), although I am running them from a Windows machine.
  • Assumption #2:
    For the steps to complete you need an active Microsoft Azure subscription. To get one, navigate to and click on the “Free Trial” button in the upper, right corner.
  • Assumption #3:
    You are able to get in touch with Janet to participate in their Moonshot pilot to get the credentials required to connect your Moonshot IdP/RP to their Trust Router Network and that way become part of the larger UK and global academia and research community implementing Moonshot.

Now let’s get started with the actual deployment of a Moonshot VM based on the image Janet and we have published together on VMDepot:

  • Install Node.js if on your machine if not done, yet.
    Node.js is needed since the Microsoft Azure Cross Platform Command Line Interface which we will use for setting up the Azure environment is built with Node.js.
  • Install the Azure Cross Platform Command Line Interface (xplat CLI).
    The Azure xplat CLI is a command line interface that allows you to script many management operations for services in your Azure subscription from either a Linux, Mac or also a Windows machine. For more details on setting it up, please refer to the Azure xplat CLI homepage.
  • Import your Subscription Publish Profile Settings through the xplat CLI.
    Before you an issue any operation to your Azure subscription in the cloud through the xplat CLI, you need to download and import a credentials file. This is the only operation that requires a GUI with a web browser, so if you issue the following command, you should sit on a machine with x-Windows installed or be on a Mac or Windows machine. Open a shell-window or a command prompt and execute the following command:
    • azure account download
      This will open a web browser and browse to a page where you’ll need to sign-in with the account that has access to your Azure subscription (your Microsoft account with which the subscription has been created or which is a Co-Admin of another subscription). It results in the download of a “xyz.publishsettings”-file which contains the credentials. Save that file to your local disk. Next execute the subsequent command:
    • azure account import <path & filename to xyz.publishsettings>
      This command finally makes the Azure xplat CLI aware of your credentials. After that step you can finally start with true management commands against your subscription.
    • Note: if you have multiple Azure subscriptions, you also need to select the subscription in which you want to create the VM using azure account set <subscription-id>
  • Create a VM image based on our VMDepot base image using the xplat CLI.
    Finally we can create the Virtual Machine based on the VMDepot image. For this purpose execute the following command in your previously opened shell:
    • azure vm create yourdnsprefix -o vmdepot-28998-1-16 -l “North Europe” yourusername yourpassword –ssh
    • This command creates a VM which will get a public DNS-name called “” through which you then can connect to your VM (e.g. via SSH).
    • The result of issuing the command should look similar to the following:
    • What you see here is that the script transfers the template for the virtual machine from VMDepot with the VMDepot image id 28998-1-16 to your storage account and then creates the VM from that template. Finally it does some clean-up stuff.
  • Make sure Moonshot/FreeRADIUS and SSH endpoints are open on the Azure firewall.
    The last step is to open up the required TCP-endpoints on the Azure firewall. This can happen after the VM has been created successfully. Ports required are 2083 and 12309 for Moonshot/FreeRADIUS, SSH is open by default on 22 given our previous command including the -ssh switch. Issue the following commands:
    • azure vm endpoint create-multiple DNS_PREFIX 2083:2083,12309:12309
    • The result should look similar to the following (note that I’ve added port 22 before, already, therefore you won’t see it in the screenshot):

After you’ve completed those steps and the VM has been created, successfully, you need to connect to the VM and perform the final Moonshot/FreeRADIUS-configuration steps. These are pretty much the same as those you’d need to do on an on-premise machine in your own data center, we’ve prepared anything in that image in a way that it should work smooth in Azure.

  • SSH into the newly created VM.
    Make sure you connect as root so you can perform all administrative tasks. All subsequent steps are to be executed in that SSH-session to your newly created VM!
  • Update to the latest package versions.
    Since the image is Ubuntu-based, use apt-* to update to the latest version of the packages. Issue the following commands:
  • Update the FreeRADIUS certificate files to match your organizational values.
    As part of the bootstrapping process, Moonshot and FreeRADIUS generate certificate files required for setting up trust relationships between your RP/IdP and other RP/IdPs. These are generated through openssl based on settings-files prepared in the bootstrap-image from VMDepot. You should customize those to match your organizational values, for example such as the common name to be used for your organization and IdP. Perform the following steps:
    • Switch to the directory /etc/freeradius/certs.
    • Open the file ca.cnf and update the following values to match your own values:
      • emailAddress
      • commonName

    It should look similar to the following if you’re using VI, if you’re really taking it serious, then also update the other values (e.g. passwords for private key files):

    • Open the file server.cnf and update the following values to match your own values:
      • emailAddress
      • commonName
      • It should look similar to the following if you’re using VI:
    • Finally update the same values also in the file client.cnf to match your own values:
    • Execute the command sudo /etc/freeradius/certs/bootstrap. This produces a lot of output, but at the end your screen will look similar to the following after executing this command:
  • Fine-tune Client Private Key Files:
    Next you need to “fine-tune” the private key files for the clients. This is supposed to be something that will be fixed/made easier in future versions of Moonshot and FreeRADIUS. Perform the following steps:
    • Change to the directory /etc/freeradius/certs.
    • Run the following command: cat client.crt client.key > client.txt
    • Now overwrite client.key with client.txt by executing the following command:
      mv client.both client.key
    • Open client.key in VI and delete all lines until the first —- BEGIN CERTIFICATE —- appearance as shown below:
  • Realm configuration – Part #1
    Now that all certificates are configured, you need to configure your “realm”-settings such as the name of your realm and other options Moonshot and FreeRADIUS allows you to set. For this blog-post we keep it with the simple creation of a realm for your setup:
    • Switch to the directory /etc/freeradius.
    • Open the file proxy.conf in VI and add the following section anywhere in the file:
    • The realm you select should match the DNS-name you’re planning to use for setup. This DNS-name should then be mapped using a DNS CNAME-alias or DNS A-Record to your setup in Azure.
    • You can look at the sample-realm configurations in the file so that you can decide which other options you’d prefer to set for your setup. For this post we keep things at a default-setup.
  • Realm configuration – Part #2:
    For the next realm-setting perform the following steps in the SSH-session:
    • Open the file /etc/freeradius/mods-enabled/realm in VI for editing.
    • Add the following section at any place in the file:
      realm suffix {
      change rp_realm = “”
    • Make sure you use the same domain-name as before (e.g. and that the name you specify here ( is a resolvable DNS-name.
    • The results should look similar to the following:
  • Modify post-authentication step that issues the SAML assertion.
    Next you need to modify the post-authentication steps. One action that happens in those post-authentication steps is the definition of SAML-assertions that get issued as a token after a successful authentication. We prepared the image with a default-template that you can customize based on your need. But even if you don’t customize the assertions, there’s one step you need to complete and that’s bringing your realm into the context of the post-authentication steps.
    • Open the file /etc/freeradius/sites-enabled/default with VI.
    • Search for a configuration section starting with post auth { … }.
    • Modify it to issue the SAML-assertion for your realm as follows:
      post-auth {
          if (Realm == LOCAL) …
      change to
          if (Realm == “”) (same as above)
    • The result should look similar as the following:
  • Request and setup Trust Router Credentials (through Janet).
    As mentioned most use of a Moonshot/FreeRADIUS install is given when you connect it to a research community. For this purpose get in touch with Janet to join their pilot via accepted onto the pilot, Janet  will send you Trust Router credentials which allow you to get into a federation with the research network Janet operates.
    • Janet (or other Trust Router operators) will send you the trust router credentials for setting up the trust relationship as an XML file. Put that XML-file on your Moonshot VM created earlier.
    • Next execute the following commands (assuming the XML-file with the Trust Router credentials is called mytrustcreds.xml):
          su –shell /bin/bash freerad
          unset DISPLAY
          moonshot-webp -f mytrustcreds.xml
    • With those credentials your IdP/RP will be able to connect and federate with the Trust Router network Janet operates for academia and research (or the one you’ve received the credentials for).

Finally that’s it, we’ve completed all steps for configuring the Moonshot/FreeRADIUS setup. Now it’s up to test the environment or start using it for your single-sign-on and authentication purposes. A simple test for your environment together with Janet could look as follows:

  • Open up three terminal sessions to your Moonshot/FreeRADIUS VM you just created.
  • In Terminal #1 perform the following steps:
    • Open /etc/freeradius/users using VI.
    • Look for the following line: testuser Cleartext-Password := “testing”
    • Leave it for the test or modify it as per your needs. Also that’s where you could add your own users of your IdP. If you leave it as above that’s the credentials you can use for testing.
    • Now execute the following commands:
      • su –shell /bin/bash freerad (runs a shell under the FreeRADIUS user)
      • freeradius -fxx -l stdout (runs freeradius for debugging with logging to stdout)
  • In Terminal #2 perform the following steps:
    • moonshot-webp -f <path to previously received trust router credentials XML>
    • tids <your-external-ip> /var/tmp/keys
      • The external IP for your Azure VM is visible in the Azure Management portal ( for your virtual machine.
      • is an example for a trusted trust router. In case you federate with Janet, that’s most likely the one you’ll use.
  • In Terminal #3 perform the following step:
    • tidc {your rp-realm}
  • Important Note: for the commands above to succeed, you need to have valid Janet Trust Router credentials and Janet needs to have your IdP/RP configured in their trust-settings as a trusted party! Otherwise later when executing the tidc-command the test will fail!
  • Finally to complete the test someone needs to use Moonshot and its identity selector on a client machine to authenticate using your IdP. The best way to do that is using the LiveDVD for Moonshot provided by Janet.

That’s it, now you have your Moonshot / FreeRADIUS IdP to get yourself connected with a huge community of researchers, scientists and students across the world… for further questions it’s best to get in touch with the people from Janet and Moonshot via And go to the Moonshot home page to find more details here:

Cloud – Windows Azure – Combining PaaS & IaaS to get best of both worlds in your Architecture

Over the past 2 years I have been working with many ISVs (Independent Software Vendors) to get their products and platforms to the Public Cloud on Windows Azure. In almost all cases the requirements and motivations from those ISVs did include one or a combination of the following reasons and/or expectations:

  • Expand beyond the own country, get global / international.
  • Be able to scale faster and easier with less amount of effort.
  • Reduce effort and costs for operations management.

Of course there are many more reasons and motivations why (or why not) an ISV or a company would consider (or not) cloud computing. But these are very common ones.

When looking at those requirements above there’s one piece they do have in common: the ISVs need to spend less time on managing your infrastructure, networking configurations and operating systems (e.g. patching etc.) to be able to be successful. With such requirements in mind I’d definitely rather look into automatically managed service offerings from Cloud Platforms such as Azure (or in other words: Platform-as-a-Service and Software-as-a-Service). Because with those requirements above you will want to have as much automatic management & setup as possible to achieve your goals.

But in practice things are often more difficult…

How far the goals above can be achieved requires looking detailed at the initial situation of the ISV and his application. In specific the application architecture and identification of which technologies are used in detail is of major relevance. Not all techniques, technologies and approaches might work well in Platform-as-a-Service runtimes such as Windows Azure Web Sites, Mobile Services or Cloud Services (often for a good reason, sometimes because some features are not available, yet). Let’s look at a typical example architecture we see most often with software vendors nowadays:

As you can see, we do have an ASP.NET MVC web front-end, some services performing more complex computational or IO-intensive tasks in the background, a database cluster (for high-availability) and a storage-system for documents, videos and other binary data. Looking at it, the naive mapping for Azure could work as follows with pure Platform-as-a-Service and ready-to-use services (such as Azure storage). That way we would not have to deal with any kind of traditional operations management at all – a truly nice vision and in my opinion something that always should be on a long-term roadmap:

Component Windows Azure Service
ASP.NET MVC Application Web Sites or Cloud Services
Computational background process Cloud Services with Worker Roles
SQL Server Cluster Azure SQL Database
Storage Cluster Azure BLOB Storage

Looks pretty simple and would be great if it would always be that easy. In practice we need to look at each component to see, if it is doing or making use of something that is not built for working in Platform-as-a-Service environments. If there’s nothing like that, definitely go for it because you’ll benefit most from the Cloud and Azure then. If you have challenges we need to consider alternatives: either adopt your product/code base or select another alternative.

And in case of Windows Azure that other alternative to PaaS definitely can be Windows Azure Virtual Machines, which is IaaS (Infrastructure-as-a-Service) on Azure. Let’s look a little bit deeper into the sample architecture above, look at some of the most important questions I typically ask and pick some assumptions for this post.
Conclusion: leverage BLOB storage as a ready-to-use service from Azure.Conclusion: Web Sites will not work because of 3rd-party components to be installed, but Cloud Services is a fit as stateless, file storage can be outsourced to Azure BLOB storage.Conclusion: Cloud Services worker are a perfect match since async processing possible and file storage can be easily replaced by BLOB storage.Conclusion: this is the only case where we cannot use the Platform-as-a-Service offering from Azure. We need to fall-back to Infrastructure-as-a-Service and run SQL Server in a Virtual Machine.

Component Questions Assumption
Storage Cluster How good is access to storage encapsulated? Is it spread across all source files or central implementation with e.g. repository pattern? Let’s assume access to file system is centrally encapsulated in the code base in a repository class. This can be easily exchanged with a BLOB-storage-based implementation.
ASP.NET MVC Application Stateless?
Persistent local file storage?
Installation of 3rd-party components needed?
For this assume, the app uses 3rd-party components, local file storage and is stateless (load-balancer ready with round-robin algorithm).
background process
Windows? Linux?
Persistent local file storage?
Installation of 3rd-party components?
Let’s assume the background job runs on Windows, can work asynchronous in the background and has no 3rd-party components needed.
SQL Server Cluster SQL features used?
Performance requirements?
Let’s assume our SQL Server database uses .NET CLR procedures and encryption functions.

The final architecture – Mixing Virtual Machines and Cloud Services…

Since we would like to be as effective and efficient as possible I definitely recommend to use Platform-as-a-Service and Software-as-a-Service where possible. Given the above sample-analysis for this example that’s the case for all components except SQL Server. Finally that leads to the following architecture in Windows Azure:

Setting-up the infrastructure in Azure (basic steps)…

To setup the architecture above in Windows Azure, you need to follow the subsequent steps in this order. Note that this is just a quick overview, in the next post I’ll give you a detailed step-by-step guide based on an example I’ll publish on my Codeplex workspace.

  1. Create an affinity group.
    All networks, virtual machines and cloud services you want to combine through a virtual network MUST be placed into the SAME affinity group.
  2. Setup a “Virtual Network” in Windows Azure.
    This network is used for having a private network with subnets in Azure that allows your Cloud Services and Virtual Machines to interact with each other. The nice thing is that as long as you don’t do VPN, this service is free of charge. Also note that the VMs (IaaS-Only, not PaaS) will remain the same IP-addresses assigned inside of the Virtual Network as long as you don’t DELETE the VMs.
  3. Create a new Virtual Machine in the network and configure SQL Server.
    After the network is created, create a VM and make sure you add it to the virtual network. After the VM has been created, perform the following steps:

    1. Open up port 1433 in the VM. That enables 1433 communication ONLY INSIDE the Virtual Network. If you also want it available externally, you need to open the port in the endpoint-configuration on the management portal from Windows Azrue.
    2. Configure SQL Server using SQL Authentication (except you also have an AD deployed in a VM in Azure, then you can also use Windows Authentication).
    3. Import your database, create a login with SQL Authentication and make sure to provide it access to the database.
    4. Finally open up a command prompt, type ipconfig and write down the IP address. Note that the address will be constant as long as you don’t delete the VM. Please DO NOT assign a static address since this is not supported in Azure VMs!!
  4. Create & deploy a Cloud Service Package for your web site and deploy.
    Finally for your ASP.NET web application (mentioned in the sample above) create a cloud service package, add the network configuration in your “ServiceConfiguration.Cloud.cscfg” XML configuration file. Before publishing make sure that your database connection string points to the IP address you’ve seen for your VM in step 2.

Final Words and more scenarios!!

Windows Azure supports “mixed deployments” that include Virtual Machines (IaaS), Cloud Services (PaaS) as well as other platform services (e.g. storage, media services etc). That enables you to get best of both worlds: the full efficiency, automatic scale and automatic management of PaaS where possible while gaining full control through VMs where needed.

Typical scenarios that are enabled by combining Virtual Machines and Cloud Services on Azure where you run most of your workloads in automatically managed Platform-as-a-Service while running other pieces on VMs where you need full control include:

  • Combining your app with Linux-based work-loads because Linux runs in Azure Virtual Machines.
  • Special SQL Server requirements that lead to situations where you cannot leverage Azure SQL Database.
  • You need to run legacy components in your app that just don’t work inside of PaaS runtimes such as Cloud Services, Web Sites & Co.

With such principles and thoughts you definitely can move much faster to the public Cloud and Windows Azure when you need to! You don’t need to re-write your whole app and use VMs where applicable while moving to PaaS where you think you can benefit most out of it!!

Windows Azure – Console Apps in Platform-as-a-Service Worker Roles the right way!

I am currently working with a software vendor in the media space who has some really valuable software assets implemented as console applications, today. These command line applications are used for some high-performance image transcoding/encoding jobs and are implemented with C/C++ (originally built for Unix/Linux environments).

Now the partner wants to run this application as part of a Platform-as-a-Service (PaaS) deployment on Windows Azure using Web/Worker Roles for a new online service they’re currently building. But a requirement is to enable that scenario without rewriting the application and re-using it as it is.

In this blog post I’ll show, how you can run console applications correctly in Windows Azure Worker Roles without a single modification of the original console application itself. This is a scenario I’ve been challenged with in some other engagements with my partners and the requirements and solutions where always similar.

I’ll start with a bit background information and then I’ll dig into the solution based on an anonymized sample use case that pretty much captures all the requirements I’ve been confronted with in such cases with several partners. The complete sample code is published on my code workspace on codeplex and is available as a download or by cloning the git-repository I am using there.

Download the release sample code as ZIP archive here

Background #1 – Why not re-writing the application?

The console application that performs transcoding/encoding tasks is not fully owned by the partner. Furthermore the application runs perfectly fine on the machines they run in their own data center – and that should remain exactly the same as not all customers will be deployed in the cloud. Furthermore their on-premise data center runs on Linux VMs and therefore the console application needs to run on both, Linux and Windows (as they’re moving to Windows Azure PaaS and not IaaS).

Background #2 – Why do they want to do Windows Azure PaaS instead of IaaS?

Next you might think it might be easier to just take Linux VMs on Windows Azure and run their console application there. Indeed that was our first approach for the overall design of the solution. But using Windows Azure Virtual Machines (and IaaS in general) means that you have to maintain the virtual machines by yourself while running the solution. Maintenance for mean means things such as taking care of OS-updates, updating the application code appropriately without downtime and the like.

That’s okay if you’re just running a bunch of virtual machines that you need to maintain. But in this case the partner needs to run > 100 virtual machines in a scale out scenario for a massive amount of transcoding jobs. With such a scale, maintenance really becomes an annoying accompanist over time that will result in increased effort, time and money required to operate the environment.

In PaaS and with Windows Azure Cloud Services (Web/Worker Roles) on the other side the operating system and all application deployments are managed automatically by the Windows Azure platform. That said the software vendor does not need to manage updates manually and maintain the operating system anymore, at all. For the ongoing operations management it means a huge amount of effort goes away and frees up time for more important and valuable things. Therefore we jointly decided to move towards PaaS instead of continuing the IaaS/virtual machine approach.

The Solution / Part #1 – High-Level architecture and “Conditions”

The sample implementation I am providing as an add-on to this post outlines the final architecture in a simplified way. Essentially we decided to run the console application in Windows Azure Worker Roles as part of an encoding/transcoding cloud service. This service gets its jobs from an Azure queue which is filled by a web service accessed by some other, permitted applications of the customers form the software vendor. Basically the flow looks as follows:

  1. The originator of a job uploads assets into Azure BLOB storage.
  2. The originator then calls a web service to submit a new encoding/transcoding job.
  3. This web service validates the submitted job and if okay it adds an entry to an Azure TABLE with the details on the job.
  4. After that the web service adds a message to queue to initiate the processing of the job by the worker.
  5. The worker picks-up the job, reads the details (e.g. the asset in BLOB storage to be processed).
  6. The worker downloads the asset from the BLOB storage onto the local machine.
  7. Then the worker executes the console application against the downloaded asset and stores the result locally, as well.
  8. After that the worker uploads the resulting asset (the output from the console application) to BLOB storage again (in a separate container).
  9. When that’s done, the worker updates the table with the Job information and marks the job as “done”.
  10. Finally the worker deletes all assets from the local file system as they’re not needed on the executing compute instance, anymore.

As you can see from the solution above, the primary aspect to consider is that the local storage on Web/Worker Roles in Windows Azure is transient. All compute instances need to be treated as “stateless” and if there’s some state on these compute instances, it needs to be “temporary”. Therefore all assets that are an output from the processing need to be uploaded to Azure BLOB (or another permanent storage service such as Azure TABLEs, Azure SQL DB etc.) after they’ve been processed on a single compute instance, successfully.

On the other hand, the console application does not understand anything about Azure BLOB storage and the like. It does understand command line parameters, local file system input and local file system output for storing results, typically. Therefore the worker needs to download the input to the local file system so that the console application is able to process the assets and after processing upload the results typically written into a local file system by such console applications to Azure BLOB storage. Given that the console application must not be changed in our scenario (which is often the case), the worker needs to take care of that processing.

Of course that is not the only scenario related to command line applications, but it is the one I’ve been challenged with most often and with several of our software vendors we’ve been working with. Therefore it is a scenario common enough for me to outline at a greater detail.

Below a sketch of that architecture in a simplified way – more correct would be a complete implementation of a queue-centric workflow pattern as outlined in Jason Short’s blog

Note that you can easily scale out that solution by adding any number of instances of the worker role you want. E.g. you could set the number of instances to 100 worker instances so that you can process 100 transcoding jobs in parallel. Of course in addition to that you could optimize that by being able to run multiple instances of your console app at the same time on a single compute instance. But still, the easy scale out just happens by adding compute instance nodes to this and every instance picks up jobs from the queue as available and processes them in parallel.

Matching the assets of the architecture sketch above to the sample-download I’ve published looks as follows:

Component in Sketch Asset in Visual Studio Sample
Client n/a – since I have built a web app
Web Service Web Role ThumbnailFrontend (ASP.NET MVC)
Proc. Worker (processing Worker) ThumbnailBackend (Azure Worker Role)
cmd.exe ThumbnailProducerApp (Commandline App)
access to Queue, Jobs table, Blob ThumbnailShared (class library)
Azure Cloud Service ThumbnailCloudService

The Solution / Part #2 – Some Implementation Details

Now that we know how the solution is structured in general, we can take a look at some more details. The sample I’ve posted on my workspace is on a fictive (yet realistic) use case of generating thumbnails from images and it implements the architecture outlined earlier. For the concrete case I’ve mentioned above we just need to replace the use case (and therefore the command line tool subject of discussion) with their use case and tool and we’re all set. Of course we’ve implemented a more details (such as error handling, dead letter queue, etc.) which I’ve left in my sample implementation for the sake of focus and simplicity, but the overall approach is the same.

In this section let’s instead of going into all details let’s focus on the heart of the Worker Role which runs the legacy console application and does the work around executing that legacy application. There are a few things to keep in mind to do this correctly. These are:

  • First the console application does not understand anything about Azure BLOB storage and will never be (since it should not be changed and should run as-is on-premise and in the cloud). So content needs to be downloaded and uploaded to and from BLOB-storage from the worker role host process.
    • The logic of this one is implemented as part of the RoleEntryPoint-class you can implement for every Web/Worker Role in Windows Azure.
  • Since the console application relies on the local file system (for reading and writing content), we need to make sure to do this in the right way. What does that mean?
    • Well, the file system and drive structure for Web/Worker Roles is defined by Azure and might change over time.
    • Therefore hard-coding drive- and directory-access is definitely not the best solution.
    • For this purpose, Local Resources do exist in Windows Azure Web/Workers that give you access to the local file system in the right way.
  • Deploying the console application needs to happen alongside with the Worker Role project so that we can call it. Calling the console application also should not happen with hard-coded paths.
    • Deployment of the console app happens by adding it to the Worker Role project and making sure the Build Action is set to “Copy Always in solution explorer. That way the necessary console application executable will be included in the *.cspkg Azure deployment package by Visual Studio.
    • Second to call the console application, correct environment variables should be used. The environment variable we are using in our sample will be the ROLEROOT environment variable which points to the root directory of the extracted content from our *.cspkg Azure deployment package. From there we can easily find the console application using relative path specifications and therefore being “resistant” against possible, future changes of deployment file structures from Azure Web/Worker Roles.

Before looking at further coding details, let’s have a quick look at how the console application of my sample works so you get a better understanding of the scenario at a greater level of detail. Essentially the sample application creates thumbnails from existing images on the local file system and creates an output file for the generated thumbnail:

Think of this as your legacy-application you want to run in Windows Azure Worker Roles without being changed. Okay, now let’s look at some code from the worker role implementation. Looking at the solution structure, the RoleEntryPoint-implementation does all the plumbing (querying the queue, downloading and uploading content from Azure BLOB storage and calling the console application). For that purpose the console legacy app needs to be deployed with the worker role through the Azure deployment package. That is done by adding the console app to the project and making sure the “Copy to Output Directory”-property is set to “Copy Always” as shown here:

As a next step you need a file system directory to which and from which your console application can read content and write new content to. In our case that’s from where the console application reads the actual source images downloaded by the worker role to the local file system of the instance processing the job and where it writes the resulting thumbnail to. From there the worker host implementation picks it up and uploads it to BLOB storage.

For that purpose you need to specify a local resource which is essentially a local, temporary directory given to you by the Windows Azure Role Environment APIs for Web/Worker Roles. Given you can specify a size for that, the environment can guarantee that this size is available locally (also depending on the instance size you have chosen for your Web/Worker Role). Local resources are configured on the properties dialog for the

Worker Role in the Windows Azure Project in Visual Studio as shown below:

Now that we know the “setup” for our project, we can look at some of the interesting pieces of code for the overall solution. The first thing I’d like to take a look at is the overall structure of our Worker Role and the initial steps before the actual processing starts as they are super-important for anything afterwards:

public class WorkerRole : RoleEntryPoint
    // ... some private variables are defined here ...

    public override bool OnStart()
        // ... default implementation

    public override void Run()
        // ... some other code here not so important right now...

        // Now reserve a local temp path where images will be saved by the console app
        Trace.WriteLine("Getting a local directory for temporary work with image thumbnail generation...");
        var localResource = RoleEnvironment.GetLocalResource(

        // Next retrieve the path of the executable deployed with this worker role 
        // based on the environment 'RoleRoot' variable 
        var legacyAppExecutable = System.IO.Path.Combine(

        // Receive messages, download the image, process them with the console app and upload the image to blob
        while (true)
            var message = queueRep.GetMessageForJob(out jobId, out hasBeenDequeued);
            if (!string.IsNullOrEmpty(jobId))
                // Process the message ...

    // ... private implementation methods go here ...

There are a few super-important aspects in this code. First we get access to the local resource defined earlier using RoleEnvironment.GetLocalResource(). This gives us access to the local path we have requested through our project properties (see picture above) with the amount of disk space required. We don’t need to take care of disk structures and file system structures, we will get a directory that works. Into this directory, the worker will download content from BLOB, execute the console app onto it and give the console app this directory as an output.

Next we are getting the right path to the console application we’ve deployed alongside with our worker role implementation. For that instead of hard-coding any path in our solution, we use environment variables which are pre-defined by the Windows Azure environment in our Web/Worker Role, already. In specific the “RoleRoot”-environment variable will point to the root directory to which our *.cspkg content has been extracted to. There, under approot we will find the files from our project and from there we can point to the legacy console app deployed with our package.

These two are the most important aspects to consider for calling the console application later on. After we’ve set them up, we start the processing loop for our worker where we try to get messages from the message loop and if there are any, we start processing them. The method ThumbnailQueueRepository.GetMessageForJob() also makes sure that poison messages do not keep our workers busy forever as shown below:

public CloudQueueMessage GetMessageForJob(out string jobId, out bool dequeued)
    var message = _jobsQueue.GetMessage(TimeSpan.FromMinutes(2));
    if (message == null)
        jobId = string.Empty;
        dequeued = false;
        return null;

    if (message.DequeueCount > 3)
        // Remove the poison message from the queue

        jobId = message.AsString;
        dequeued = true;
        // Return the job ID
        jobId = message.AsString;
        dequeued = false;

    return message;


Note that I’ve decided to encapsulate queue, table and blob-processing into a repository-class in my sample implementation so that in theory they could be easily replaced with other implementations at a later point in time (such as using Azure Service Bus Queues instead of Azure Storage Queues for different scenarios). Implemented completely correct I should have added poison messages to a dead-letter queue, but I didn’t do this for the sake of keeping the sample simpler.

Now let’s come to the heart of the worker where the console application gets ultimately called. This happens in the Worker Role project in a method called ProcessJob(). This method gets the path to the console application which we resolved using the RoleRoot-environment variable as well as the path to the local temporary directory.

It then downloads the source image which is added through an http-URL in the Azure table I’ve created for maintaining the details of the job received through the queue before. It is accessed using JobsRepository.GetJob(). After downloaded it calls the console application simply using Process.Start() based on the path we have resolved earlier.

The console application of course saves the file into our temporary directory (which we pass in through command line arguments) from where the Worker implementation below picks it up and uploads it to BLOB-storage (encapsulated in the BlobRepository-class in my sample).

Finally in any case the worker implementation shown below tries to clean-up all files from the temporary directory (which is just deleting all files from the temporary directory using System.IO classes from the .NET framework). The clean-up is important to be done because if we won’t do it, the temporary space fills up over time and then we’re running into out-of-space exceptions if our temporary space is full based on the quota we’ve requested in the project properties, earlier.

And that’s it, all of that is encapsulated in the ProcessJob()-method below, which is part of my Worker Role project from the sample implementation.


private void ProcessJob(string jobId, string appPath, string localTempPath)
    // First get the job data
    var job = JobsRepository.GetJob(jobId);
    job.Status = ThumbnailJobEntity.JOB_STATUS_RUNNING;

    // Temporary files used for processing by the legacy app
    var sourceFileName = Path.Combine(localTempPath, "source_" + jobId);
    var targetFileName = Path.Combine(localTempPath, "result_" + jobId);

        // Next download the source image to the local directory
        var httpClient = new HttpClient();
        var downloadTask = httpClient.GetAsync(job.SourceImageUrl);
        using (var sourceFile = new FileStream(sourceFileName, FileMode.Create))

        // Then execute the legacy application for creating the thumbnail
        var app = Process.Start
                        string.Format("\"{0}\" \"{1}\" Custom 100 100", sourceFileName, targetFileName)
        // You should set a timeout to wait for the external process and kill if timeout exceeded

        // Evaluate the result of execution and throw exception on failure
        if (app.ExitCode != 0)
            var errorMessage = string.Format("Legacy app did exit with code {0}, processing failed!", app.ExitCode);
            Trace.WriteLine(errorMessage, "Warning");
            throw new Exception(errorMessage);

        // Processing succeeded, Now upload the result file to blob storage and update the jobs table
        using (FileStream resultFile = new FileStream(targetFileName, FileMode.Open))
            var resultingUrl = BlobRepository.SaveImageToContainer(resultFile, job.TargetImageName);

            job.Status = ThumbnailJobEntity.JOB_STATUS_COMPLETED;
            job.TargetImageUrl = resultingUrl;
        // Deletes all temporary files that have been created as part of processing
        // It would also be good to run that time-controlled in regular intervals in case of this fails
        TryCleanUpTempFiles(sourceFileName, targetFileName);

In Summary

The scenario outlined in this blog article is a very common one I have been challenged from partners often: run a console application in Windows Azure PaaS without modifying it. With the following considerations, that can be accomplished easily:

  • Use LocalResource through project properties and RoleEnvironment.GetLocalResource() instead of hard-coding paths.
  • Let the console application access all files required through that local resource path.
  • Download and upload input-files and output-files to and from BLOB storage to that local resource path for processing by the console application to keep web/worker role instances “stateless” and be able to scale out (well, this is a MUST).
  • Instead of hard-coding the path to your console application, use the environment variable “RoleRoot” to get the correct path to your console application.
  • For a list of environment variables issued by Windows Azure look at the following blog post:
  • Make sure you clean up the files you store in the local temporary file system to avoid running into disk space problems and the like.

Ultimately to me solutions like these also show, that having legacy apps in place does not mean PaaS through Web/Workers in Azure is not an option. It is, even if it might not be obvious at first.

I do hope this post, the sample and the details were helpful to many other developers. As mentioned, I’ve been challenged by software development partners I’ve been working with more often with such a scenario and therefore I decided to post a more detailed sample implementation on it.