Azure In-VM Instance Metadata & Managed Service Identities with ARM Templates and GoLang combined – inside/out

A lot changed since my last blog post… we had a great and beatiful summer with an awesome vacation and I am now part of the Azure Customer Advisory Team which is the customer-facing part from Azure Engineering. So, I finally ended up in Jason Zander’s part of Microsoft, the person who’s responsible for Azure, itself. That means I am now involved in the most complex Azure-projects we run with customers and not dedicated to SAP, only, anymore. Although I still work with SAP a lot.

Now, in the meantime a lot of Azure tech stuff expanded as well. In this post I want to focus on two specific features – the In-VM Instance Metadata Service and the Managed Service Identity (in short, MSI) which we recently started using in a customer project even before MSI got publicly available and announced.

I’ve posted about the need for in-VM instance metadata as well as an approach for allowing Virtual Machines to perform automated management operations in a previous blog-post, already. While what I wrote back then is technically still possible, MSI and in-VM Instance Metadata are the recommendation for such scenarios right now. So, you can consider this as the long-awaited follow-up post for this previous one!

Recap the scenario

The scneario I posted about back then was about virtual machines that need to read data about themselves and also modifying configuration settings about themselves through Azure Resource Manager REST API calls. In the meantime, that very same customer I blogged about back then came with a new scenario that requires a similar capability to us.

Essentially, in that scenario a VM needed to capture it’s own IP addresses and determine the IP addresses of its peers for performing automated configurations of networking routes and keepalived settings for an HA setup (more details to follow in a separate blog post).

All of this is possible through a combined use of the new Azure in-VM instance metadata service and the Managed Service Identity!

In-VM Instance Metadata in a Nutshell

This is really nothing special, AWS and other cloud providers have it for ages, already. It essentially gives applications and scripts running inside of the VM an HTTP endpoint available from within the VM, only. This endpoint returns fundamental basic details about a Virtual Machine such as its name, network configurations, unqiue identifiers etc. For Azure Virtual Machines, this endpoint is available on http://169.254.169.254/metadata/instance?api-version=2017-04-02 and returns JSON-formatted data about the virtual machine that looks similar to the following:

myuser@mylinuxvm:~$ curl -H Metadata:true "http://169.254.169.254/metadata/instance?api-version=2017-04-02" | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   515  100   515    0     0   115k      0 --:--:-- --:--:-- --:--:--  125k
{
  "compute": {
    "location": "westeurope",
    "name": "mylinuxvm",
    "offer": "UbuntuServer",
    "osType": "Linux",
    "platformFaultDomain": "0",
    "platformUpdateDomain": "0",
    "publisher": "Canonical",
    "sku": "16.04-LTS",
    "version": "16.04.201708151",
    "vmId": "d7......-9...-4..4-b..b-2..........4",
    "vmSize": "Standard_D2s_v3"
  },
  "network": {
    "interface": [
      {
        "ipv4": {
          "ipAddress": [
            {
              "privateIpAddress": "10.1.0.5",
              "publicIpAddress": "xx.xx.xx.xx"
            }
          ],
          "subnet": [
            {
              "address": "10.1.0.0",
              "prefix": "24"
            }
          ]
        },
        "ipv6": {
          "ipAddress": []
        },
        "macAddress": "00........B3"
      }
    ]
  }
}
myuser@mylinuxvm:~$

It’s a simple REST-service only accessible to anything that runs inside of the VM. All you need to take care off is ensuring, that you pass the Metadata: true HTTP-header when calling into the service. The call above shows the fundamental basics, only. There’s much more the service provides, for a complete look, review the documentation.

Managed Service Identities (MSI)

The in-VM instance metadata service is great if you need to query details about the VM, itself. What what if you need to query more? For example, which other servers are available in the same resource group to be able to configure keepalived for automatically configuring an HA-setup with Unicast instead of multi-cast for the availability pings? That’s especially important on Azure, since Multi-Cast is blocked by the VNET infrastructure. Finding out which other servers are available in the same resource group is not possible through the in-VM instance metadata service!

In my previous blog post about this topic when Instance-Metadata and MSI where not available, yet, the scenario was for a Marketplace Image to open up ports on Azure NSGs as part of an automated process after the user entered more details into a post-provisioning registration application that ran inside of the VM. Again, such actions do require access to the Azure Resource Manager REST APIs… and that, in turn, requires to authenticate against Azure Active Directory with a valid principal.

In the past, you had to manually create a Service Principal for such actions and assign permissions in the Azure Subscription for it. Then, from within the VM, you had to sign-in against Azure AD from your script or application using this Service Principal to gain access to the Azure Resource Manager REST APIs. This introduced a very delicate challenge: where would you store the credentials for being able to sign-in with the Service Principal from within the VM!?

With Managed Service Identities, these kind of scenarios become way easier to implement and removes the challenge for you to manage secrets in Virtual Machines for Service Principals. With MSIs activated, all sorts of Azure Service Instances can get identities assigned which are fully managed by Azure through it’s Microsoft.ManagedIdentity resource provider.

MSIs can be enabled on Virtual Machines, but also other types of Services as you can read in the documentation. You can enable it through the portal, via an ARM template or with PowerShell or the Azure CLI!

Enabling Managed Service Identities

There are two pieces to it, which are getting more visible when you enable MSIs through:

  • Assigning an MSI to a resource which essentially results in the creation of a “managed service principal” for an Azure Resource such as a Virtual Machine that is made available to this Azure Resource, only!
  • Making tokens available to the respective resource for which the Managed Service Identity has been created. For VMs, this happens through a Virtual Machine Extension called the ManagedIdentityExtensionForWindows or ManagedIdentityExtensionForLinux, respectively. When the extension is enabled for a virtual machine, any software running inside of the VM can request a token which is created as a result of an authentication against Azure AD with the MSI credentials. You don’t have to take care about those credentials since they are managed by the MSI infrastructure for you.

Once you have an MSI attached to a Virtual Machine (or another Azure Resource), you can to assign permissions to this identity for performing management operations against resources in your Azure subscriptions. The following screen shot shows this in the portal:

Assigning Permissions to a Managed Service Identity

If you need to assign the permissions via CLI, then you need to get the object IDs and App IDs for the service principals which are managed for you behind the scenes. Below is an excerpt of Azure CLI commands and results showing what you need to do!

mszcool@dev:~$ az vm show --resource-group LinuxHaWithUdrs --name lxHaServerVm0 --out json
{
  ...
  "id": "/subscriptions/a...fe/resourceGroups/LinuxHaWithUdrs/providers/Microsoft.Compute/virtualMachines/lxHaServerVm0",
  "identity": {
    "principalId": "f3....26d",
    "tenantId": "72....47",
    "type": "SystemAssigned"
  },
  "instanceView": null,
  "licenseType": null,
  "location": "westeurope",
  "name": "lxHaServerVm0",
  "networkProfile": {
    ...
  },
  "osProfile": {
    ...
  },
  "plan": null,
  "provisioningState": "Succeeded",
  "resourceGroup": "LinuxHaWithUdrs",
  "resources": [
    ...
  ],
  "storageProfile": {
      ...
    }
  },
  "tags": {},
  "type": "Microsoft.Compute/virtualMachines",
  "vmId": "52.....6bf"
}
mszcool@dev:~$ az ad sp show --id f3....26d
AppId             DisplayName       ObjectId          ObjectType
----------------  ----------------  ----------------  ----------------
8b............f1  RN_lxHaServerVm0  f3............6d  ServicePrincipal

As you can see, when you get the VM object through ARM, it contains a new section called identity which contains all the details about the managed service identity you need to retrieve further details from Azure AD (above also by using the CLI).

That information can be used for things such as creating custom roles with permissions and then assigning the MSI to this custom role instead of assigning explicit permissions.

And end-2-end example

As I’ve mentioned before, one of the main use cases – so also for my customer – to use these assets combined is all about VMs that need to retrieve (and modify) details about themselves and peers in a joint-deployment. In an simplified example I wanted to demonstrate the fundamental the basic mechanics of the Instance Metadata Service and the Managed Service Identity so that you understrand, how you can make use of them in your own scripts and applications.

The sample builds the foundation for the scenarios I’ve explained earlier (VMs getting infos about themselves and their peers). Rather than trying to hit it all with a single post, you can expect more complex scenario posts later on that make use of the mechanics explained in this post.

Essentially, the sample creates an infrastructure with a jump-box and a set of servers as shown in the following Azure Network Watcher topology diagram.

All of the code is available on my GitHub repository for review:

https://github.com/mszcool/azureMsiAndInstanceMetadata

Network Watcher Topology

On each of the servers, a simple GO-based REST API runs which allows to show the instance metadata of the server itself as well as get all the other servers in the same machine. The servers are exposed through an Azure Load Balancer using NAT so that every server can be accessed, individually on a port to be able to call into specific servers. Note that I’ve set this up this way for demo-purposes, only so that you easily can access each server and examine its instance metadata and its output of getting details about its peers individually.

In a real-world environment I could rarely or not at all think about scenarios to expose instance metadata or data about peers to the public, directly. So, this is for demo-purposes, only, I wanted to re-iterated on that.

Assigning MSIs to the Servers and giving them permissions

For the sample, I used ARM templates to assign MSIs to the individual Server VMs and enable the respective MSI VM extension so that an application running inside of the respective VM can get a token for accessing resources under the identity of the VM it’s running in – the excerpt is from the azuredeploy.json template on my GitHub repository.

...
{
    "apiVersion": "[variables('computeAPIVersion')]",
    "type": "Microsoft.Compute/virtualMachines",
    "copy": {
        "name": "serverVmCopy",
        "count": "[parameters('serverCount')]"
    },
    "name": "[concat(variables('serverVmNamePrefix'), copyIndex())]",
    "location": "[parameters('location')]",
    "identity": {
        "type": "systemAssigned"
    },
    "dependsOn": [
        "[resourceId('Microsoft.Network/networkInterfaces',concat(variables('serverNicNamePrefix'),copyIndex()))]",
        "[resourceId('Microsoft.Storage/storageAccounts', variables('storageAccountName'))]",
        "[variables('serversAvSetId')]"
    ],
    "properties": {
        ...
    }
}
...
{
    "apiVersion": "[variables('computeAPIVersion')]",
    "type": "Microsoft.Compute/virtualMachines/extensions",
    "name": "[concat(variables('serverVmNamePrefix'),copyIndex(),'/IdentityExtension')]",
    "location": "[parameters('location')]",
    "copy": {
        "name": "serverVmMsiExtensionCopy",
        "count": "[parameters('serverCount')]"
    },
    "dependsOn": [
        "[resourceId('Microsoft.Compute/virtualMachines', concat(variables('serverVmNamePrefix'), copyIndex()))]"
    ],
    "properties": {
        "publisher": "Microsoft.ManagedIdentity",
        "type": "ManagedIdentityExtensionForLinux",
        "typeHandlerVersion": "1.0",
        "autoUpgradeMinorVersion": true,
        "settings": {
            "port": "[variables('msiExtensionPort')]"
        },
        "protectedSettings": {}
    }
}
...

As you can see above, the server-VM gets a system assigned identity in the ARM template. Further down in the template, the Managed Identity Extension is activated for each server VM instance. The variable msiExtensionPort is set to 50342 in my example, which means that an application or script running inside of the VM can retrieve a token for management operations from within the VM on that port (http://localhost:50342/oauth2/token).

Taking care of RBAC

Now we have an MSI and the ability for applications to get tokens when running inside of the VM. But so far the possibilities of using that identity are limited since it does not have any permissions, yet. These are assigned through the ARM template, as well:

...
{
    "apiVersion": "[variables('authAPIVersion')]",
    "type": "Microsoft.Authorization/roleAssignments",
    "name": "[parameters('rbacGuids')[add(mul(copyIndex(),2),1)]]",
    "copy": {
        "name": "serverVmRbacDeployment",
        "count": "[parameters('serverCount')]"
    },
    "dependsOn": [
        "[resourceId('Microsoft.Compute/virtualMachines', concat(variables('serverVmNamePrefix'), copyIndex()))]"
    ],
    "properties": {
        "roleDefinitionId": "[variables('rbacContributorRole')]",
        "principalId": "[reference(concat(resourceId('Microsoft.Compute/virtualMachines',concat(variables('serverVmNamePrefix'),copyIndex())),'/providers/Microsoft.ManagedIdentity/Identities/default'),variables('managedIdentityAPIVersion')).principalId]",
        "scope": "[resourceGroup().id]"
    }
},
..

This assigns permissions to created MSIs for the VMs to read resources of the resource group the VMs are deployed in. To get the role definition, which is stored in the [variables('rbacContributorRole')] in my template, I had to execute an Azure CLI statement along the lines of the following:

az role definition list --query "[?properties.roleName == 'Contributor']" --out json

The next tricky bit is the name of the RBAC role assignment. Unfortunately, that needs to be a unqiue GUID. In my very simplified example, I pass in the GUIDs for the role assignments as parameters in the template:

...
"rbacGuids": {
    "type": "array",
    "metadata": {
        "description": "Exactly ONE UNIQUE GUID for each server VM is needed in this array for the RBAC assignments (sorry for that)! WARNING: if you want to keep this template deployment repeatable, you must generate new GUIDs for every run or delete RBAC assignments before running it, again!"
    },
    "defaultValue": [
        "12f66315-2fdf-460a-9c53-8654ae72c390",
        "12f66315-2fdf-460a-9c53-8654ae72c391",
        "12f66315-2fdf-460a-9c53-8654ae72c392",
        "12f66315-2fdf-460a-9c53-8654ae72c393",
        "12f66315-2fdf-460a-9c53-8654ae72c394",
        "12f66315-2fdf-460a-9c53-8654ae72c395",
        "12f66315-2fdf-460a-9c53-8654ae72c396",
        "12f66315-2fdf-460a-9c53-8654ae72c397",
        "12f66315-2fdf-460a-9c53-8654ae72c398",
        "12f66315-2fdf-460a-9c53-8654ae72c399"
    ],
    "minLength": 4,
    "maxLength": 18
}
...

The reason for this is to make it simple to replace those values as part of an integrated CI/CD pipeline with every continuous build that might involve such an ARM-template deployment. I might write a separate, short post about that topic. For now, I just grab a GUID for each server-RBAC-assignment I want to make as part of my template to generate a unique name for the assignment by using "name": "[parameters('rbacGuids')[add(mul(copyIndex(),2),1)]]".

The next trick part of this section in the template is getting the ID of the principal created for the managed service identity of the respective server VM. This part of the template really gets hard to read, so I broke it up into multiple lines although you cannot do that in a real template:

    "properties": {
        "roleDefinitionId": "[variables('rbacContributorRole')]",
        "principalId": "[reference
        (
            concat(
                resourceId(
                    'Microsoft.Compute/virtualMachines',
                    concat(
                        variables('serverVmNamePrefix'),copyIndex()
                    )
                ),'/providers/Microsoft.ManagedIdentity/Identities/default'
            ),
            variables('managedIdentityAPIVersion')
        ).principalId]",
        "scope": "[resourceGroup().id]"
    }

The code is using the reference()-template-function to get the principal ID of the service principal created as managed identity. That principal is a child-object of the virtual machine, so we need to start with the resourceId() of the virtual machine and attach the identities section to it. Finally, the reference()-function requires an API version where we use the version for the managed identity provider from a variable "managedIdentityAPIVersion": "2015-08-31-PREVIEW" in the code.

Getting a Token for your MSI

Based on the requests from that specific customer project where we needed this functionality, I decided to use Go as a programming language. I am still not a GoLang-expert, so I took the opportunity to learn. Using MSIs always follows two major steps:

  • Acquire a token through the locally installed VM Extension.This happens by calling into http://localhost:<port-selected-in-MSI-extension> settings/oauth2/token endpoint which is offered by the MSI VM Extension.
  • Use that token in REST API calls to the Azure Resource ManagerThese are regular REST-calls with the HTTP Authorization header containing the bearer token retrieved earlier.

In my GoLang-based example, I have one module contained in the file msitoken.go which performs a REST-call against the local OAuth2 server offered by the VM Extension (note that this is an incomplete excerpt, for the full code look at the file msitoken.go on my GitHub repo):

// etc. ...

const msiTokenURL string = "http://localhost:%d/oauth2/token"
const resourceURL string = "https://management.azure.com/"

// etc. ...

var myToken MsiToken

// Build a request to call the MSI Extension OAuth2 Service
// The request must contain the resource for which we request the token
finalRequestURL := fmt.Sprintf("%s?resource=%s", fmt.Sprintf(msiTokenURL, msiPort), url.QueryEscape(resourceURL))
req, err := http.NewRequest("GET", finalRequestURL, nil)
if err != nil {
    log.Printf("--- %s --- Failed creating http request --- %s", t.Format(time.RFC3339Nano), err)
    return myToken, "{ \"error\": \"failed creating http request object to request MSI token!\" }"
}

// Set the required header for the HTTP request
req.Header.Add("Metadata", "true")

// Create the HTTP client and call the instance metadata service
client := &http.Client{}
resp, err := client.Do(req);
if err != nil {
    t = time.Now()
    log.Printf("--- %s --- Failed calling MSI token service --- %s", t.Format(time.RFC3339Nano), err)
    return myToken, "{ \"error\": \"failed calling MSI token service!\" }"
}
// Complete reading the body
defer resp.Body.Close()

// Now return the instance metadata JSON or another error if the status code is not in 2xx range
if (resp.StatusCode >= 200) && (resp.StatusCode <= 299) {
    dec := json.NewDecoder(resp.Body)
    err := dec.Decode(&myToken)
    // etc. ...
}
// etc. ...

Two aspects are important:

  • First, you always need to add the “Metadata: true” header for the call. All other calls will be rejected!
  • Second, you need to add a query-string parameter to the request called resource=uri://to-your-resource-you-want-to-do-calls-to. In our case, this is always the Azure Resource Manager REST APIs resource https://management.azure.com/.

Once we have executed the call, we do have a valid token available. Note that we didn’t have to fiddle around or deal with any kinds of secrets which is super-convenient. The Azure MSI infrastructure is totally taking care of the required details and there is not even a possibility to get access to any kinds of secrets for Managed Identities.

Using the MSI Token

This is the rather simple part of the story because it’s no different to any other Azure REST API call performed with any other kind of Azure AD user/principal. Once you have the token, you just use it in the HTTP Authorization header to call into the Azure Resource Manager REST APIs and if permissions are set up as previously outlined when I wrote about RBAC, all should go well.

The following snippets are parts of the GoLang Source file mypeers.go

const (
    environmentNameSubscription string = "SUBSCRIPTION_ID"
    environmentNameResourceGroup string = "RESOURCE_GROUP"

    restAPIEndpoint string =
        "https://management.azure.com/subscriptions/%s/resourceGroups/%s/%s"

    vmRelativeEndpoint string =
        "providers/Microsoft.Compute/virtualmachines?api-version=2016-04-30-preview"

    authorizationHeader string = "%s %s"
)

func GetMyPeerVirtualMachines(msiToken MsiToken) (vms string, errOut string) {
    // etc. ...
    subID := os.Getenv(environmentNameSubscription)
    resGroup := os.Getenv(environmentNameResourceGroup)
    // etc. ...

    // Create the final endpoint URLs to call into the Azure Resource Manager VM REST API
    finalURL := fmt.Sprintf(restAPIEndpoint, 
                              subID, resGroup, vmRelativeEndpoint)
    finalAuthHeader := fmt.Sprintf(authorizationHeader,
                              msiToken.TokenType, msiToken.AccessToken)

    // Build a request to call the instance Azure in-VM metadata service
    req, err := http.NewRequest("GET", finalURL, nil)
    if err != nil {
        // etc. ...
    }
    req.Header.Add("Authorization", finalAuthHeader)

    // Create the HTTP client and call the instance metadata service
    client := &http.Client{}
    resp, err := client.Do(req);
    if err != nil {
        // etc. ...
    }
    // Complete reading the body
    defer resp.Body.Close()

    // Now return the raw VM JSON or another error if the status code is not in 2xx range
    if (resp.StatusCode >= 200) && (resp.StatusCode <= 299) {
        bodyContent, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            // etc. ...
        }
        // etc. ...
        return string(bodyContent), ""
    }

    // etc. ...

    return "", fmt.Sprintf("{ \"error\": \"Azure Resource Manager REST API call returned non-OK status code: %d \" }", resp.StatusCode)
}

This code is super-simple and just retrieves all other servers in the same resource group. It assumes, that the resource group and the subscription ID are both set as environment variables before the GO-application is started. This should give you an idea, how a server in a resource group could find other servers and get their private IP addresses to automatically configure components such as e.g. keepalived during an automated post provisioning step or something similar.

The Instance Metadata Service

The MSI and Azure ARM REST API calls can help retrieving details about peers or performing more complex management operations incl. creating or updating resources depending on the permissions given to a particular MSI. But for retrieving information details about itself, a VM does not necessarily need to go through MSI and ARM REST APIs since there’s a way simpler approach if it’s just about retrieving details about the VM itself.

For a few months, Azure makes an in-VM instance metadata service available which can be called from within the VM, only, but without additional authentication requirements. The documentation about the instance metadata service shows, how-to retrieve the data with simple tools such as curl. Again, the important thing is to include the metadata header as with the MSI token service, before.

In this end-2-end sample, I show, how to call the in-VM instance metadata service from a GoLang application. Again, I just show the mechanics, no concrete scenario for this post, but it should equip you with being able to implement scenarios such as the ones I’ve explained several times throughout the post. And I plan for subsequent blog-posts making use of these mechanics for a real scenario implementation. Below again an excerpt of the GoLang-code that retrieves instance metadata, for the full code please review metadata.go:

const instanceMetaDataURL string =
          "http://169.254.169.254/metadata/instance?api-version=2017-04-02"

/*GetInstanceMetadata ()
 *Calls the Azure in-VM Instance Metadata service and returns the results to the caller*/
func GetInstanceMetadata() string {
    // etc. ...

    // Build a request to call the instance Azure in-VM metadata service
    req, err := http.NewRequest("GET", instanceMetaDataURL, nil)
    if err != nil {
        // etc. ...
    }

    // Set the required header for the HTTP request
    req.Header.Add("Metadata", "true")

    // Create the HTTP client and call the instance metadata service
    client := &http.Client{}
    resp, err := client.Do(req);
    if err != nil {
        // etc. ...
    }
    // Complete reading the body
    defer resp.Body.Close()

    if (resp.StatusCode >= 200) && (resp.StatusCode <= 299) {
        bodyContent, err := ioutil.ReadAll(resp.Body)
        // etc. ...
        return string(bodyContent)
    }
    // etc. ...
    return fmt.Sprintf("{ \"error\": \"instance meta data service returned non-OK status code: %q \" }", resp.StatusCode)
}

The Main Go-Application

Before putting it all together, let’s have a quick look at the main GoLang application so that you get a sense, where those previous pieces of code are called from. The main application is fairly simple, it bootstraps a GoLang HTTP server and configures some routes for the HTTP-handlers (full source in main.go).

package main

import (
     "log"
     "net/http"
     "github.com/gorilla/mux"
)

var myRoutes = map[string]func(http.ResponseWriter, *http.Request){
        "/": Index,
        "/meta": MyMeta,
        "/servers": MyPeers}

func main() {
    router := mux.NewRouter().StrictSlash(true);
    for key, value := range myRoutes {
        router.HandleFunc(key, value);
    }
    log.Fatal(http.ListenAndServe(":8080", router))
}

The handlers.go then contains the functions which are referred to in the array myRoutes defined in the source code above. These are the actual functions called when the respective route URLs are called:

/*Index (w, r)
 *Returns with a list of available functions for this simple API*/
 func Index(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintln(w, "Welcome!");
}

/*MyMeta (w, r)
 *Returns instance metadata retrieved through the in-VM instance metadata service of the VM*/
func MyMeta(w http.ResponseWriter, r *http.Request) {
    metaDataJSON := GetInstanceMetadata()
    fmt.Fprintf(w, metaDataJSON)
}

/*MyPeers (w, r)
 *Uses the MSI to get a token and list all the other servers available in the resource group*/
func MyPeers(w http.ResponseWriter, r *http.Request) {
    token, err := GetMsiToken(50342)
    if err != "" {
        fmt.Fprint(w, err)
    } else {
        peerVms, err := GetMyPeerVirtualMachines(token)
        if err != "" {
            fmt.Fprint(w, err)
        } else {
            fmt.Fprint(w, peerVms)
        }
    }
}

Putting it all together

To make exploring this as easy as possible for you, the ARM templates and scripts I provide as part of this solution are setting up the entire environment automatically. To recall, here’s the screen shot of the entire environment from the Azure Network Watcher, again:

Network Watcher Topology

The ARM template sets up the Network, Virtual Machines, Network Security Groups etc. and for making it simple to explore the responses of the different servers without SSHing into the VMs, I also added a Load Balancer that exposes the GoLang application via Port-Mapping to each of the servers on the public load balancer. That means, you can just perform an http-request against the public load balancer with a port that maps to the server for which you would like to see the responses for. A few examples:

Of course, you can also SSH into the Jump-Box set up as part of this deployment and explore everything from the inside. Essentially, what I do is the following as part of the ARM template deployment to automate the setup of the GoLang application:

  • The ARM-template contains a custom script extension that runs on each of the servers to build the Go-application and generate a shell-script that registers the GoLang REST-API I’ve explained above as a service daemon.
  • The Service Daemon script which is generated as part of the server setup and copied to /etc/init.d/msiandmeta.sh sets the Subscription ID and the target resource group as an environment variable before launching the GoLang Application.

For making the process simple and easy to follow, I use a template for the init.d-script that gets generated with the custom script extension. This script is also on my github repository called template.msiandmeta.sh.

#!/bin/bash
### BEGIN INIT INFO
# Provides:          msiandmeta
# Required-Start:    $local_fs $network $named $time $syslog
# Required-Stop:     $local_fs $network $named $time $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: GoLang App using Azure MSI and Metadata
# Description:       Runs a Go Application which is a web server that demonstrates usage of Managed Service Identities and in-VM Instance Metadata
### END INIT INFO

appUserName=__USER__
appPath=__APP_PATH__
appName=__APP_NAME__

processIDFilename=$appPath/$appName.pid
logFilename=$appPath/$appName.log

#
# Starts the simple GO REST service
# 
start() {
    # Needed by the GO App to access subscription and resource group, correctly
    export SUBSCRIPTION_ID="__SUBSCRIPTION_ID__"
    export RESOURCE_GROUP="__RESOURCE_GROUP__"

    # Check if the service runs by looking at it's Process ID and Log Files
    if [ -f $processIDFilename ] && [ "`ps | grep -w $(cat $processIDFilename)`" ]; then
        echo 'Service already running' >&2
        return 1
    fi
    echo 'Starting service...' >&2
    su -c "start-stop-daemon -SbmCv -x /usr/bin/nohup -p \"$processIDFilename\" -d \"$appPath\" -- \"./$appName\" > \"$logFilename\"" $appUserName
    echo 'Service started' >&2
}

#
# Stops the simple GO REST service
#
stop() {
    if [ ! -f $processIDFilename ] && [ ! "`ps | grep -w $(cat $processIDFilename)`" ]; then
        echo "Service not running" >&2
        return 1
    fi
    echo "Stopping Service..." >&2
    start-stop-daemon -K -p "$processIDFilename"
    rm -f "$processIDFilename"
    echo "Service stopped!" >&2
}

#
# Main script execution
#

case $1 in

    start)
      start
      ;;

    stop)
      stop
      ;;

    restart)
      stop
      start
      ;;

    \?)
      echo "Usage: $0 start|stop|restart"
esac

In this script, you can see tokens such as __SUBSCRIPTION_ID__. These tokens are replaced by the script that’s executed at provisioning time for each of the servers through the custom script extension definition in the main ARM template for the entire solution:

{
    "apiVersion": "[variables('computeAPIVersion')]",
    "type": "Microsoft.Compute/virtualMachines/extensions",
    "name": "[concat(variables('serverVmNamePrefix'),copyIndex(),'/SetupScriptExtension')]",
    "location": "[parameters('location')]",
    "copy": {
        "name": "serverVmSetupExtensionCopy",
        "count": "[parameters('serverCount')]"
    },
    "dependsOn": [
        "[resourceId('Microsoft.Compute/virtualMachines',concat(variables('serverVmNamePrefix'), copyIndex()))]",
        "[concat('Microsoft.Compute/virtualMachines/', concat(variables('serverVmNamePrefix'),copyIndex()),'/extensions/IdentityExtension')]"
    ],
    "properties": {
        "publisher": "Microsoft.Azure.Extensions",
        "type": "CustomScript",
        "typeHandlerVersion": "2.0",
        "autoUpgradeMinorVersion": true,
        "settings": {
            "fileUris": [
                "[concat(parameters('_artifactsLocation'),'/scripts/setup_server_node.sh',parameters('_artifactsStorageSasToken'))]",
                "[concat(parameters('_artifactsLocation'),'/scripts/template.msiandmeta.sh',parameters('_artifactsStorageSasToken'))]",
                "[concat(parameters('_artifactsLocation'),'/app/main.go',parameters('_artifactsStorageSasToken'))]",
                "[concat(parameters('_artifactsLocation'),'/app/handlers.go',parameters('_artifactsStorageSasToken'))]",
                "[concat(parameters('_artifactsLocation'),'/app/metadata.go',parameters('_artifactsStorageSasToken'))]",
                "[concat(parameters('_artifactsLocation'),'/app/msitoken.go',parameters('_artifactsStorageSasToken'))]",
                "[concat(parameters('_artifactsLocation'),'/app/mypeers.go',parameters('_artifactsStorageSasToken'))]"
            ]
        },
        "protectedSettings": {
            "commandToExecute": "[concat('./setup_server_node.sh -a ', parameters('adminUsername'), ' -s ', subscription().subscriptionId, ' -r ', resourceGroup().name)]"
        }
    }
}

The script that’s invoked through the custom script extension above is also on my GitHub repository and generates the final init.d-script for the service registration based on the input parameters. These input-parameters are exactly the subscription-name, the resource group name and the user under which the daemon should run. Here’s an excerpt of the setup_server_node.sh that builds the GoLang App and generates the target init.d-script:

#
# Next compile the Go Application
#
mkdir ./app
mv *.go ./app

export PATH="$PATH:/usr/local/go/bin"
export GOPATH="`realpath ./`/app"
export GOBIN="$GOPATH/bin"
go get ./app
go build -o msitests ./app

sudo mkdir /usr/local/msiandmeta
sudo cp ./msitests /usr/local/msiandmeta
sudo chown -R $adminName:$adminName /usr/local/msiandmeta

#
# Configure apache2 to use the Go application as a CGI script
#
cat ./template.msiandmeta.sh \
| awk -v USER="$adminName" '{gsub("__USER__", USER)}1' \
| awk -v APP_NAME="msitests" '{gsub("__APP_NAME__", APP_NAME)}1' \
| awk -v APP_PATH="/usr/local/msiandmeta" '{gsub("__APP_PATH__", APP_PATH)}1' \
| awk -v SUBS="$subscriptionId" '{gsub("__SUBSCRIPTION_ID__", SUBS)}1' \
| awk -v RGROUP="$resGroup" '{gsub("__RESOURCE_GROUP__", RGROUP)}1' \
>> msiandmeta.sh

#
# Now make sure the script is handled by the system for starting/stopping the service
#
sudo cp ./msiandmeta.sh /etc/init.d
sudo chmod +x /etc/init.d/msiandmeta.sh
sudo update-rc.d msiandmeta.sh defaults

With that, the GoLang-application that accesses the ARM REST APIs through the MSI and the instance metadata service as part of this sample should run, automatically, and always find the correct subscription ID and resource group name as part of the environment variables since they’re set by the init.d-script generated from the template through this way!

Testing the environment

Once you have deployed the ARM template into your subscription, you should be able to call the GoLang-application I’ve explained above that demonstrates the mechanics of the instance metadata service and the Managed Service Identity in action through the Load-Balancer using the NAT-ports for each server. The reason for mapping each server through a port to the outside world was for demo-purposes and to make it as easy as possible for you to examine the different responses of the different servers without SSHing into any machine. The following screen shot shows this in action by comparing different responses from different servers.

Running the app in action

Of course, in the real world you would not expose these things, directly, but rather use them from within your applications!! For this sample and for enabling you to ramp up with details, quickly, it should be helpful, hopefully!

Final Words

Managed Service Identities and the in-VM Instance Metadata Service are extremly helful and it was long overdue to have these kind of great capabilities. Both services allow you to implement complex scenarios such as:

  • Implementing licensing and IP-protection strategies based on the in-VM instance metadata service.
  • Script automated configurations of clustered environments by being able to call into Azure Resource Manager REST APIs from within Virtual Machines without the need of managing secrets for Service Principals.
  • many, many more and similar scenarios.

With both services availabe on Azure, my previous blog-post becomes obsolete for this specific scenario, although there might still be many reasons for leveraging service principals for other scenarios, of course (so it might still be a good source for learning details about service principals in Azure AD, in general). But the specific scenario outlined in both, that previous post and this one, can be implemented way better with Managed Service Identities and the in-VM Instance Metadata Service combined!

I hope you enjoyed reading this and it was valuable for you. We went through something that leverages these mechanics in a very similar way for a concrete scenario with one of my customers… my plan is to post about a concrete scenario that leverages these mechanics as one of my next blogging activities.

Stay Tuned!

Single-Sign-On with SAP HANA, Azure Active Directory and Office 365

Signle-Sign-On with Azure Active Directory for HANA

At last SAP Sapphire (May 2017) we announced several improvements and also new offerings for SAP on Azure as you can read here. The most prominent ones are more HANA Certifications as well as SAP Cloud Platform on Azure (as you can read from my last blog post specifically focused on SAP CP.

One of the less discussed and visible announcements, despite being mentioned, is the broad support of Enterprise-Grade Single-Sign-On across many SAP technologies with Azure Active Directory. This post is solely about one of these offerings – HANA integration with Azure AD.

Pre-Requisites for HANA / AAD Single-Sign-On

An integration of HANA with Azure AD (AAD) as primary Identity Provider works for HANA Instances you can run anywhere you want (on-premises, any public IaaS, Azure VMs or SAP Large Instances in Azure). The only requirement is, that the end-user accessing apps (Web Administration, XSA, Fiori) running inside of the HANA instance has access to the Internet to be able to sign-in against Azure AD.

For this post, I start with an SAP HANA Instance that runs inside of an Azure Virtual Machine. You can deploy such HANA instances either manually or through the SAP Cloud Appliance Library.

In addition to just running HANA, I’ve also installed XRDP for the Linux VM in Azure and SAP HANA Studio inside of the Virtual Machine to be able to perform necessary configurations across both, the XSA Administration Web Interface as well as HANA Studio as needd.

Finally, you need to have access to an Azure Active Directory tenant for which you are the Global Administrator or have the appropriate permissions to add configurations for Enterprise Applications to that Azure AD Tenant!

The following figure gives an overview of the HANA VM environment I used for this blog-post. The important part is the Azure Network Security Group which opens up the ports for HTTP and HTTPS for HANA which are following the pattern 80xx and 43xx for regular HTTP and HTTPS, respectively.

HANA VM in Azure Overview

Azure Active Directory Marketplace instead of manual configuration

SAP HANA is configured through the Azure Active Directory Marketplace rather than the regular App Registration model followed for custom developed Apps in Azure AD. There are several reasons for this, here are the most important ones outlined:

  • SAML-P is required.Most SAP assets follow SAML-P for Web-based Single-Sign-On. While it is possible in Azure AD, when setting it up manually with advanced options, Azure AD Premium Edition is required. For offerings from the Azure AD Marketplace (Gallery), standard edition is sufficient. While that’s not the primary reason, it’s a neat one!
  • Entity Identifier Formats for SAP Assets.When registering an application in Azure AD through the regular App Registration model, Application IDs (Entity IDs in federation metadata documents) are required to be URNs with a protocol prefix (xyz://…). SAP applications use Entity IDs with arbirtray strings not following any specific format. Hence a regular app registration does not work. Again, this challenge can be solved through the Enterprise App Integration in AAD Premium. But when taking the pre-configured Offering from the Marketplace, you don’t need to take care of such things!
  • Name ID formats in issued SAML Tokens.Users are typically identified using Name ID assertions (claims). In requests, Azure AD accepts nameid-format:persistent, nameid-format:emailAddress, nameid-format:unspecified and nameid-format:transient. All of these are documented here in detail. Now, the challenge here is:
    • HANA sends requests with nameid-format:unspecified.
    • This leads to Azure AD selecting the format for uniquely identifying a user.
    • But HANA expects the Name ID claim to contain the plain user name (johndoe instead of domain\johndoe or johndoe@domain.com).
    • This leads to a mismatch and HANA not detecting the user as a valid user even if the user exists inside of the HANA system!

    The Azure AD Marketplace item is configured and on-boarded in a way, that enables this technical challenge to be resolved.

  • Pre-configured claims.While that’s not a need for HANA in specific, for most of the other SAP-related offerings, the marketplace-based integrartion pre-configures the SSO-configuration with claims/assertions typically required by the respective SAP technology.

Step #1 – Register HANA in Azure Active Directory

Assuming you have HANA running in a VM as I explained earlier in this post, the first step to configure Azure AD as an Identity Provider for HANA is to add HANA as an Enterprise Application to your Azure AD Tenant. You need to select the offer as shown in the screen shot below:

Selecting the HANA AAD Gallery Offering

Within the first step, you just need to specify a Display Name for the app as shown in the Azure AD management portal. The details are configured later down the road as next steps. Indeed you can get more detailed instructions from within the Azure AD management portal, directly. Just open up the Signle Sign-On-section, select SAML-based Sign-On in the very top Dropdown-Box, then scroll to the bottom and click the button for detailed demo instructions.

Detailed Demo instructions for SAML-P

If you’re filling out the SAML-P Sign-In settings according to these instructions, you’re definitely on a good path. So, let’s just walk through the settings so you get an example of what you need to enter there:

  • Identifier: should be the Entity ID which HANA uses in it’s Federation Metadata. Needs to be unique across all enterprise apps you have configured. I’ll show you later down in this post, where you can find it. Essentially you need to navigate to HANA’s Federation Metadata in the XSA Administration Web Interface.
  • Reply URL: use the XSA SAML login endpoint of your HANA system for this setting. For my Azure VM, it had a public IP address bound to the Azure DNS name marioszpsaphanaaaddemo.westeurope.cloudapp.azure.com, therefore I had to configure https://marioszpsaphanaaaddemo.westeurope.cloudapp.azure.com:4300/sap/hana/xs/saml/login.xscfunc for it.
  • User Identifier: this is one of the most important settings you must not forget. The default, user.userprincipalname will NOT work with HANA. You need to select a function called ExtractMailPrefix() in the Dropdown and select the user.userprincipalname for the Mail-parameter of this function.

Detailed Settings Visualized

Super-Important: Don’t ignore the information-message shown right below the certificate list and the link for getting the Federation Metadata. You need to check the box Make new certificate active so that the signatures will be correctly applied as part of the sign-in process. Otherwise, HANA won’t be able to verify the signature.

Step #2 – Download the Federation Metadata from Azure AD

After you have configured all settings, you need to save the SAML configuration before moving on. Once saved, you need to download the Federation Metadata for configuring SSO with Azure AD within the HANA administration interfaces. The previous screen-shot highlights the download-button in the lower, right corner.

Downloading the federation metadata document is the easiest way to get the required certificate and the name / entity identifier configured in your target HANA system.

Step #3 – Login to your HANA XSA Web Console and Configure a SAML IdP

We have done all required configurations on the Azure AD side for now. As a next step, we need to enable SAML-P Authentication within HANA and configure Azure AD as a valid identity provider for your HANA System. For this purpose, open up the XSA web console of your HANA System by browsing to the respective HTTPS-endpoint. For my Azure VM, that was:

https://marioszpsaphanaaaddemo.westeurope.cloudapp.azure.com:4300/sap/hana/xs/admin

Of course, HANA will still redirect you to a Forms-based login-page because we have not configured SAML-P, yet. So, sign-in with your current XSA Administrator Account in the system to start the configuration.

Tip: take note of the Forms-Authentication URL. If you break something in your SAML-P configuration later down the road, you can always use it to sign back in via Forms Authentication to fix the configuration! The respective URL to take note of, is: https://marioszpsaphanaaaddemo.westeurope.cloudapp.azure.com:4300/sap/hana/xs/formLogin/login.html?x-sap-origin-location=%2Fsap%2Fhana%2Fxs%2Fadmin%2F.

Now the previously downloaded federation metadata document from Step #2 above becomes relevant. In the XSA Web Interface, you need to navigate to SAML Identity Providers and from there, click the “+”-button on the bottom of the screen. In the form opening now, just paste the previously downloaded federation metadata document into the large text box on the top of the screen. This will do most of the remaining job for you! But, you need to fix a few fields.

  • The name in the General Data must not contain any special characters, also no spaces.
  • The SSO URL is not filled by default since we don’t have it in the AAD metadata, yet. So you need to manually fill it as per the guidance from within the Azure AD portal shown above in this post.

HANA SAML IdP Data Filled

Since we are in the HANA XSA tool, it’s the right point in time to show you, where I retrieved the information required earlier in the Azure AD portal when registering HANA as an App there – the Identifier as shown in the last screen shot from the Azure AD console, above.

Indeed, these details are retrieved from the SAML Service Provider configuration section as highlighted in the screen shot below. A quick side-note: this is one of the rare cases where I constantly needed to switch to Microsoft Edge as a browser instead of Google Chrome. For some reasons, in Chrome I was unable to open the metadata tab, while in Edge I typically can open the metadata-tab which shows the entire Federation Metadata document for this HANA instance. From there, you can also grab the identifier required for Azure AD since this is the Entity ID inside of the Federation Metadata document.

HANA SAML Federation Metadata

Ok, we have configured Azure AD as valid IdP for this HANA system. But we did not really enable SAML-based authentication for anything. This happens now at the level of applications managed by the XS-environment inside of HANA (that’s how I understand this with my limited HANA knowledge:)). You can enable SAML-P on a per-package basis inside of XSA, means it’s fully up to you to decide for which components you plan to enable SAML-P and for which you stay with other authentication methods. Below a screen shot that enables SAML-P for SAP-provided package! But stick with a warning: if you enable SAML-P for those, this might also have impact on other systems interacting with those packages. They should probably also support SAML-P as a means of authentication, especially if you disable other options, entirely!

HANA SAML Federation Metadata

By enabling the sap-package for SAML-P, we get SSO based on Azure AD for a range of built-in functions including the XSA web interface, but also Fiori-interfaces hosted inside of the HANA instance for which you configured the setting.

Step #4 – Troubleshooting

So far so good, seems we could try it out, right? So, let’s logout, open an In-Private-Browsing session with your browser of choice and navigate to your HANA XSA Administration application, again. You will see, that this time by default you will get redirected to Azure AD for signing into the HANA System. Let’s see what happens when trying to login with a valid user from the Azure AD tenant.

HANA SAML Federation Metadata

Seems the login was not so successful. Big question is why. This is now where we need access to the HANA system with HANA Studio and access to the system’s trace log. For my configuraiton, I installed XRDP on the Linux machine and have the HANA Studio running directly on that machine. So, best way to start is connecting to the machine, starting HANA Studio and navigating to the system configuration settings.

HANA Diagnosis for Sign-In Failing

The error-message is kind of confusing and miss-leading, though. We spent some time when onboarding HANA into the AAD Marketplace to figure out what was going wrong. So much ahead – Fiddler-Traces and issues with Certificates where not the problem! The resolution for this is to be found in an entirely different section. Nevertheless, I wanted to show this here, because it really is extremly valuable to understand, how-to troubleshoot stuff when it’s not going well.

The main reason for this failure is a miss-match in timeout configurations. The signatures are created based on some time stamps. One of those timestamps is used for ensuring, that authentication messages are valid only a given amount of time. That time is set to a very low limit in HANA by default resulting into this, quite miss-leading error message.

Anyways, to fix it, you need to stay in the HANA System Level Properties within HANA Studio and make some tweaks and adjustments. Within the system properties of the Configuration Tab, just filter settings by SAML and adjust the assertion_timeout setting. It’s impossible to do an entire, user-driven sign-in process within 10 sec. Think about it, the user navigates to a HANA App, gets re-directed to Azure AD, needs to enter her/his username/password, then eventually there’s Multi-Factor-Auth involved and finally upon success, the user gets redirected back to the respective HANA application. Impossible within 10sec. So, in my case, I adjusted it to two min.

HANA Diagnosis for Sign-In Failing

Btw. this behavior and required configuration is also documented on official SAP Support Notes as an outcome of the collaboration between us and the SAP HANA team as part of enabling SSO with Azure AD (thanks for the great collaboration, again:)):

SAP Support Note 2476310 – SAML SSO to HANA System Using Microsoft AAD – Your Browser Shows the Error “Assertion did not contain a valid Message ID”

Ok, time for a next attempt. Now, if you still have the same error message about not being able to validate the signature, you probably forgot something earlier in the game. Make sure that when configuring HANA in Azure AD you make the certificate active by hitting the Make new certificate active checkbox I’ve mentioned earlier… below the same screen shot with the important informational message, again!

Don't forget Make new certificate active

Step #4 – Configuring a HANA Database User

If you’ve followed all the steps so far, the Sign-In with a User from Azure AD will still not succeed. Again, the trace logs from HANA are giving more insights on what’s going on and why the sign-in is failing this time.

Trace about User does not exist in HANA

HANA is complaining that it does not know about the user. This is a fair complaint since Azure AD (or any other SAML Identity Provider) takes care for authentication, only. Authorization needs to happen in the actual target system (the service provider, also often called relying party application). For being able to Authorize, the user needs to be known to the service provdier. That means, at least some sort of user entity needs to be configured.

  • With HANA, that means you essentially create a database user and enable Single Sign-On for this database user.
  • HANA then uses the NameID-assertion from the resulting SAML token to assign the user authenticated by the IdP, Azure AD in this case, to map that successfully authenticated user to a HANA database user. This is where the format of the NameID in the issued token is so important and why we had to configure the ExtractMailPrefix()-strategy in the Azure AD portal as part of Step #1.

So, to make all of this happen and finally get to a successful login, we need to create a user in HANA, enable SSO and make sure that user has the appropriate permissions in HANA to e.g. access Fiori Apps or the XSA Administration Web Interface. This happens in HANA Studio, again.

Detailed Settings Visualized

Super-Important: The left-most part of the figure above visualizes the mapping from the SAML-Token’s perspective. So it defines the IdP as per the previous configurations and the user as it will end up being set in the NameID-assertion of the resulting SAML-token. With Azure AD users, these will mostly be lower-case! Case matters here!!! Make sure you enter the value lower-case here, otherwise you’ll get a weird message about dynamic user creation failing!!!

The next step is to make sure that the user has the appropriate permissions. For me, as a non-HANA-expert, I just gave the user all permissions to make sure I can show success as part of this demo. Of course, that’s not a best practice. You should give those permissions appropriate for your use cases, only.

Detailed Settings Visualized

Step #5 – A Successful Login

Finally, we made it! If you have completed all the steps above, you can start using HANA with full Single-Sign-On across applications also integrated with your Azrue AD tenant. For example, the screen shot below shows my globaladmin-user account signing into the HANA Test VM I used, navigating to the HANA XSA Administration web console and then navigating from there to Office 365 Outlook… It all works like a charm without me being required to enter credentials, again!

Detailed Settings Visualized

That is kind-of cool, isn’t it! It would then even work with navigating back and forth between those environments. Now, this scenario would work for any application that runs inside of the XS-environment.

But for now, at least for enterprise administrators, it means they can secure very important parts of their HANA systems with a prooven Identity platform using Azure AD. They can even configure Multi-Factor Authentication in Azure AD and thus even further protect HANA environments along other applications using the same Azure AD tenant as an Identity Provider.

Final Words

Finally, this is the simplest possible way of integrating Single-Sign-On with SAP applications using Azure AD, only. SAP Netweaver would be similarly simple as it is documented here. There’s even a more detailed tutorial available for Fiori Launch Pad on Netweaver based on these efforts we’ve implemented on SAP blogs here.

The tip of the iceberg is then the most advanced SSO we’ve implemented with SAP Cloud Platform Identity Authentication Services. This will give you centralized SSO-management through both company’s Identity-as-a-Service offerings (Azure AD, SAP Cloud Platform Identity Services). As part of that offering, SAP even includes automated identity provisioning which would remove the need for manually creating users as we did above.

I think, over the past year, we achieved a lot with the partnership between SAP and Microsoft. But if you ask for my personal opinion, I think the most significant achievements are HANA on Azure (of course, right:)), SAP Cloud Platform on Azure and … the Single-Sign-On Offerings across all sorts of SAP technologies and services!

I hope you found this super-interesting. It is most probably my last blog post as member of the SAP Global Alliance Team from the technical side since I am moving forward to the customer facing part of Azure Engineering (Azure Customer Advisory Team) as an engineer. Still, I am part of the family and will engage as needed out of my new role with SAP, that’s for sure!

Developed an SAP HANA Express Azure Quick Start Template

Context

This is an exciting week for me… although I am usually not that much into attending Business-oriented conferences, over the past two years I did so by attending SAP Sapphire.

Up until know, that mainly is caused by the fact that I’ve contributed some aspects to what was announced with regards to the partnership between Microsoft and SAP at the conference. Last year, my part was mainly about Office 365 with the work I’ve supported for Concur, Ariba, Fieldglass and SuccessFactors as well as the Sports Basement Demo which was shown in the Key Note to highlight the HANA One Certification for Azure in our DS14 VM-Series at that time.

While I cannot write about the major project I supported for this year, yet, here’s a little nugget to get started with – a Quick Start Template for SAP HANA Express!

Important Note: While I am working for Microsoft, this blog summarizes my personal opinions and my personal understanding of topics. This means that all I am writing here is not related to Microsoft’s official opinion, at all! If you want to get that view, look at the Azure Blog or Jason Zander’s Blog for official announcements!

Important Note: The Pull-Request into the official Microsoft Azure Quick Start Templates GitHub repository is not completed, yet. Therefore, the link still redirects to the working branch in my GitHub repository. I’ll update the blog-post as soon as the pull request is completed (there’s currently a general issue with the Travis CI pipeline used for validation on the Azure Quick Start Templates GitHub repository that caused an unexpected delay for the pull request to go through in time for Sapphire).

SAP HANA Express

As you might know, already, SAP HANA is SAP’s in-memory Database Engine which is supposed to back all of the major future releases of SAP’s core business suites centered around S/4HANA. But, HANA can be used as a stand-alone database system for developing custom solutions, as well. The Sports Basement Demo on Azure DS14 Instances from last year’s Sapphire Key Note is an example of that. It was a plain HANA Database fronted by a Java Web Application.

Now, within last year’s Sapphire and now, SAP released a version of SAP HANA that is free for development and testing purposes up to 32GBs of RAM called HANA Express.

For more details, you should navigate to the SAP HANA Express Homepage to get the full picture and the official view:

https://www.sap.com/developer/topics/sap-hana-express.html

Azure Quick-Start-Templates

Now, shortly before Sapphire some folks from Microsoft and SAP approached me for creating an Azure Marketplace Image for HANA Express. That’s something we’re working on, but it is not something that can be done in just a few days. Too short… but since HANA Express clearly addresses developers, I thought a good solution that can be implemented within a few days is an Azure Quick-Start-Template.

For those of you who are new to Azure: Azure Quick-Start-Templates are open source based Azure Resource Manager Templates and Deployment Scripts which can be used to quickly spin-up Solutions on Azure. Many of those are used as a learning resource, but some of them can definitely be used for dev/test scenarios or even as a starting point for production scenarios.

All of these are available under the following two links:

Now, the main point with these quick start templates is, that they’re automating most of the setup/provisioning procedure by using Scripts and Templates so you can get started, quickly.

SAP HANA Express Quick Start Template

I decided to build such a template for HANA Express and make it available as part of the Quick-Start-Templates. This Quick-Start works with SAP HANA Express 2.0 SPS1 and it also should work with other version, but I’ve only tested it with this one.

http://aka.ms/sap-hana-express-quickstart

There’s just one caveat: since SAP requires you to accept the EULA for SAP HANA Express, you first need to go to SAP’s HANA Express Home Page, register and download the SAP HANA Express Setup Images manually before being able to start using the Quick Start Template I’ve created. But from there on you can start using the template. The basic workings of the template are:

  1. First you register with SAP and download the HANA Express Setup Packge.
  2. Then you use the quick start template to upload the setup package into your private Azure Storage Account.
  3. From that moment forward you can use the Azure Resource Manager Template included in the template to deploy as many HANA Express Instances into your own subscription as you want.

Starting with Step 2, everything is automated with Scripts and Templates. That means, only the first step – downloading the setup packges and accepting the EULA at the SAP HANA Express Setup Homepage is something you need to do manually. Sure, a Marketplace Image would be more convenient, but we’ll work with SAP on that…

All the details are explained in the sections below in this blog post including how you can validate, that the installation really went well at the end of the entire process.

Important Note: Please don’t forget, that quick start templates are not something that is backed by Microsoft Support of any means. They are here to help you getting started on your own, they are full open source and maintained at a best effort basis!

Requirements

Before moving on, all you need on your local machine are the following assets/tools:

Register Download HANA Express Setup from SAP

Yes, that’s the first step. It needs to be done for two reasons: first, you need to accept SAP’s EULA and second SAP will inform you about important updates and service releases for HANA when registering at the registration page.

That said, the first thing you do is navigating to the SAP HANA Express Home Page to register with SAP and Accept the EULA:

HANA Express

HANA Express Download Manager Option

Next, you’ll need to use the SAP Download Manager to download the SAP HANA Express Setup packages. The Download Manager exists as native version for Linux or Windows or as a Cross-Platform Version for Java (distributed as JAR-package). I’ve used the JAR-package version, but it should not matter at all.

What matters is selecting the right type of setup packages. The quick-start-template I’ve built is tested for the Server-Version, only, without XA Advanced services. So you should select the following for download when using the SAP HANA Express Download Manager:

HANA Express Server Only Download Option

The downloaded setup package will appear as TAR-Archive in your local Downloads-folder (or wherever you downloaded it to). It should be called something along the lines of hxe.tgz.

Upload the HANA Express to your Azure Storage Account

For automating the setup procedure of SAP HANA Express inside of an Azure Virtual Machine, the HANA Express Setup packages need to be available for download through an automation script! To avoid uploading the setup package (~ 1.6GBs of data) with each new VM, the best approach is uploading that into an Azure storage account and using a shared access signature to feed the setup files into the provisioning script for Virtual Machines.

Now, performing those steps can be done manually by using a tool such as the Azure Storage Explorer. But I decided to automate the procedure using the Azure CLI 2.0.

Assuming that you have your SAP HANA Express Setup packages downloaded to the folder /mnt/c/temp/hxe.tgz, you can execute the following command:

# Login into your Azure Subscription using the Azure CLI 2.0
az login

# Create a resource group for the storage account
az group create --name "sampleresourcegroupname" --location "westeurope"

# Upload the HANA Express Setup Files to your Azure Storage Account
./prepare-hxe-setup-files.sh sampleresourcegroupname samplestorageaccountname samplecontainer westeurope /home/mydirectory/hxe.tgz

Now, as I mentioned before, the automated setup of HANA Express happens in a script that runs in the post-provisioning phase of the Azure VM. That means, this script needs to have access to those setup files for an automated download without user interaction. To enable that scenario, the prepare-hxe-setup-files.sh-Script of my Quick Start uploads the setup packages for HANA Express to an Azure Storage Account and generates a Shared Access Signature URL which allows to simply download the packages using that Signature as a means of authentication by using wget or any similar shell-tool.

The following Screen-Shot shows the output of the prepare-hxe-setup-files-sh-script. You should especially take note of the Shared Access Signature URL the script outputs at the end!

Output of prepare-hxe-setup-files.sh

Deploy an SAP HANA Express using the templates

With the Setup packages for SAP HANA Express uploaded to an Azure Storage Account and the Storage Shared Access Signature generated as mentioned above, you can deploy as many SAP HANA Express Virtual Machines as you need to.

Important Note: the script prepare-hxe-setup-files.sh above generates Shared Access Signatures that are valid for a year. That means, after a year, you need to run the script, again, to generate a new Shared Access Signature. Note that the script is smart in detecting, if the files have been uploaded, already, and if so it generates the signature for the existing blob instead of uploading, again!

When using the quick-start-template, you can either use the “Deploy-To-Azure”-Button presented on the landing page of the Quick Start or you fill out the parameters in the azuredeploy.parameters.json-file as shown below and deploy the template via PowerShell or the Azure CLI:

Parameters filled in the azuredeploy.parameters.json

After you’ve filled out the parameters – the screen shot above shows the minimum ones you need to fill out – you can move ahead and deploy the template using code similar to the following:

# Create a resource group for your HANA Express VM Resources
az group create --name "samplehanaexpressgroup" --location "westeurope"

# Deploy the template with the filled parameters file
az group deployment create --resource-group="samplehanaexpressgroup" \
                           --template-file="azuredeploy.json" \
                           --parameters="@azuredeploy.sample.parameters.json" \
                           --name="samplehanaexpress"

The output of that script should look similar to the following screen shot:
Parameters filled in the azuredeploy.parameters.json

Validating the Installation

So far so good, if the output looks similar to the screen shot above, then you should be all set! But you can of course validate your installation – in two ways: using regular HANA tools to see if your instance is responsive or look at the installation logs from the provisioning process.

For that, you need to understand some background. I am using Azure Custom Script Extensions for Linux to automatically execute the HANA Installation with all required pre-requisites during the post-VM provisioning phase of the Azure Virtual Machine. That is expressed in the Azure Resource Manager Template with the following code:

... REST OF THE TEMPLATE ...

    {
        "type": "extensions",
        "name": "hxeinstallextension",
        "apiVersion": "2016-04-30-preview",
        "location": "[resourceGroup().location]",
        "dependsOn": [
            "[concat('Microsoft.Compute/virtualMachines/', parameters('vmNamePrefix'))]"
        ],
        "properties": {
            "publisher": "Microsoft.Azure.Extensions",
            "type": "CustomScript",
            "typeHandlerVersion": "2.0",
            "autoUpgradeMinorVersion": true,
            "settings": {
                "fileUris": [
                    "[parameters('hxeInstallScriptUrl')]"
                ]
            },
            "protectedSettings": {
                "commandToExecute": "[concat('sudo ./', parameters('hxeInstallScriptName'), ' \"', parameters('hxeSetupFileUrl'), '\" \"', parameters('hxeMasterPwd'), '\" && exit 0')]"
            }
        }
    }

... REST OF THE TEMPLATE ...

This part of the template shows, that after the Virtual Machine Resource has been provisionied, it uses the Azure Virtual Machine Agent to run the script specified in the template. This script is downloaded from the Quick-Start-Templates GitHub repository, directly. So no further steps needed to enable this.

If you now want to validate, whether the installation script from SAP HANA Express ran, successfully, you first should review the deployment logs within the Azure Portal similar to what’s shown in the following screen shot:

Azure Portal Deployment Log

If you still need to see more, then you just need to SSH into the virtual machine created (refer to the DNS name specified in the azuredeploy.parameters.json as per the screen shots above) and output the content of the stdout and stderr files within the /var/lib/waagent/custom-script/download/0 directory similar to what’s shown in the following screen shot:

Virtual Machine Deployment Log

When you look at the output, you’ll quickly realize, how much this really accelerates you. The setup script automatically performs the following steps for you:

  • Install the needed Oracle JDK on your machine
  • Install required library packages using zypper
  • Download and Extract the HANA Express Setup Packages to the VM
  • Install HANA Express on the VM

Final Words

The post above went quite into the details of every step of how the quick-start works. But the essence can literally be done within about 30min. depending on how fast your Internet connection is:

  • Register for an Azure Subscription if you don’t have one, yet
  • Setup the Azure CLI 2.0 on your machine if not done so, yet
  • Register with SAP
  • Download the SAP HANA Express Setup Packages
  • Execute the script prepare-hxe-setup-files.sh and note the generated Shared Access Signature
  • Click the “Deploy-to-Azure Button” or update the parameters file and execute az group deployment create

All of the needed pre-requisites and of course SAP HANA Express itself gets set up for you and within about 30min. you have an instance of it running in Microsoft Azure. I hope you find this more valuable and helps you to accelerate the setup of dev/test environments with SAP HANA Express since you don’t need to walk through all of the needed setup steps, manually.

Azure Virtual Machines – A Solution for Instance Metadata in Linux (and Windows) VMs

At SAP Sapphire we announced the availabiltiy of SAP HANA on Azure. My little contribution to this was working on case that was shown as a demo in the key note at SAP Sapphire 2016: Sports Basement with HANA on Azure. It was meant as a show-case and proof for running HANA One workloads in Azure DS14 VMs and it was the first case of HANA on Azure productive outside of the SAP HANA on Azure Large Instances.

While we proved we can run HANA One in DS14, what’s still missing is the official Marketplace image. We are working on that on-boarding of HANA One into the Azure Marketplace at the time I am writing this post here. This post is about a very specific challenge which I know is needed by many others, as well. While Azure will have a built-in solution, it is not available, today (August 2016), so this might be of help for you!

Scenario: A VM reading and modifying data about itself

This is a very common scenario. HANA One needs it as well. On other cloud platforms, especially AWS, a Virtual Machine can query information about itself without any hurdles through an instance metadata service. On Azure, as powerful as it is, we don’t have such a service available, yet (per August 2016). To be precise, we do, but it currently delivers information about regular maintenance, only. See here for further details. While such a service is in the works, it is not available, yet.

Instance metadata is especially interesting for software providers which want to offer their solutions through the marketplace. The metadata can be used for various aspects including association and validation of licenses or protection of software assets inside of the VM.

But what if a VM needs to modify settings through Cloud Provdier Management APIs, automatically? Even with an instance metadata service available, such requirements need a more advanced approach.

Solution: A possible approach outlined (and code on my GitHub Repo)

Based on that I started thinking about this challenge, prototyping it and sharing it with the broader technical community. With Azure having the concept of Service Principals available, I tried the following path:

  1. If we could pass in a Service Principal at the creation of the VM, we’d have all we need to call into Azure Resource Manager APIs.
  2. The VM can identify itself through it’s “Unique VM ID”. So we could query into Azure Resource Manager APIs and find the VM based on this ID.
  3. For Marketplace use cases it is necessary, that the user is FORCED to enter the credentials. So an ARM template with mandatory parameters for passing in the details for the Service Credential is needed.

With this in place we can solve both problems with a single solution: with the right permissions equipped, a Service Principal can query instance metadata through Azure Resource Manager APIs and modify virtual machine settings at the same time. Indeed, the Azure Cloud Foundry Bosh solution uses that approach as well, although it does not need to “identify” virtual machines. It just creates and deletes them…

For most Marketplace Vendors incl. the case above, the VM needs to change details about itself. So their would need to be a way for the VM to find itself through the VM Unique ID. Since nobody was able to answer the quesiton if that’s possible, I prototyped it with the Azure CLI.

Important Note: This is considered to be a prototype to proof if what is outlined above generally works. For production scenarios you’d need to code this in professional frameworks, better protect secrets by using those and build this into your product.

GitHub Repository: I’ve prototyped the entire solution and published it on my GitHub Repository here:

–>> https://github.com/mszcool/azureSpBasedInstanceMetadata

Step #1: Create a Service Principal

The first step is creating a Service Principal. That is not an easy task, especially when you think about offerings in a Marketplace where business people want to have fast and simple on-boarding.

Guess for what I’ve created this solution-prototype on my GitHub repository (with a blog-post followed). The idea of this prototype is to provide a ready-to-use service that creates Service Principals in your own subscription.

I still run this on my Azure Subscription, so if you need a Service Principal and you don’t like scripting, just use my tool for creating it. Note: please use in-private browsing and sign-in with a Global Admin (or get a Global Admin who does an Admin-Consent for my tool in your tenant).

If you love scripting, then you can use tools such as the Azure PowerShell or the Azure Cross Platform CLI. In my prototype, I built the entire set of scripts with the Azure CLI and tested it on Ubuntu Linux (14.04 LTS). Even cooler, I indeed developed and debugged all the Scripts on the new Bash on Ubuntu on Windows:
Bash on Windows

The script createsp.sh shows a sample-script which creates a Service Principal and assigns the needed roles to the Service Principal to read VM metadata in the subscription (it would be better to just target the resource group in which you want to create the VM… I just kept it like that for convenience).

# Each Service Principal in Azure AD is backed by an 'Application-registration'
azure ad app create --name "$servicePrincipalName" \
                    --home-page "$servicePrincipalIdUri" \
                    --identifier-uris "$servicePrincipalIdUri" \
                    --reply-urls "$servicePrincipalIdUri" \
                    --password $servicePrincipalPwd

# I use JQ to extract data out of JSON results such as the AppId
createdAppJson=$(azure ad app show --identifierUri "$servicePrincipalIdUri" --json)
createdAppId=$(echo $createdAppJson | jq --raw-output '.[0].appId')

azure ad sp create --applicationId "$createdAppId"

Note: I created the App and the Service Principal separately since the AppID is needed to login using Azure CLI with the Service Principal, anyways. Therefore I separated those steps since I needed to read the App and the Service Principal Object IDs, anyways.

Note: JQ is really a handy command line tool to extract data from the neat JSON-responses of the Azure CLI. Take a look at further details here.

After the Service Principal and the App are both created, I can assign the roles to the Service Principal so that he can query the VM Metadata in my subscription:

# If I would create the resource group earlier, I could use the
# --resource-group switch instead of the --subscription switch here to scope
# permissions to the resource group of the VM to-be-created, only.
azure role assignment create --objectId "$createSpObjectId" \
                             --roleName Reader \
                             --subscription "$subId" 

Finally, to complete the work, I needed the Tenant ID of the Azure AD Tenant for the target subscription which is also needed for the Login with a Service Principal with the Azure CLI. Indeed the following code-snippet is at the very beginning of the createsp.sh-Script:

# Get the entry for the target subscription
accountsJson=$(azure account list --json)

# The Subscription ID is needed throughout the script
subId=$(echo $accountsJson | jq --raw-output --arg pSubName $subscriptionName '.[] | select(.name == $pSubName) | .id')

# Finally get the TenantID of the Azure AD tenant which is associated to the Azure Subscription:
tenantId=$(echo $accountsJson | jq --raw-output --arg pSubName $subscriptionName '.[] | select(.name == $pSubName) | .tenantId')

With those data-assets above in place, the tenantId, the appId and the password selected for the app-creation, we can log-in with the service principal using the Azure CLI as follows:

azure telemetry --disable
azure config mode arm
azure login --username "$appId" --service-principal --tenant "$tenantId" --password "$pwd"

Note: Since we want to login in a script that runs automated in the VM to extract the metadata for an application at provisioning-time (in my sample – in the real world this could happen on a regular basis with a cron-job or something similar), we need to make sure to avoid any user prompts. The latest versions of Azure CLI prompt for telemetry data collection on the first call after installation. In an automation script you should always turn this off with the first command (azure telemetry --disable) in your script.

Step #2: A Metadata Extraction Script

Okay, now we have a Service Principal that could be used from backend jobs to extract metadata for the VM in an automated way, e.g. with the Azure CLI. Next we need a script to do exactly that. For my prototpye, I’ve created a shell script (readmeta.sh) that does exactly that. For this prototype I injected this script through the Custom Script Extension for Linux.

Note: Since the SAP HANA One team uses Linux as their primary OS, I just developed the entire prototype with Shell-Scripts for Linux. But fortunately, due to the Bash on Ubuntu on Windows 10, you can also run those from your Windows 10 machine right away (if you have the 2016 Anniversary Update installed).

You can dig into the depths of the entire readmeta.sh-Script if you’re interested. I just extract VM and Networking details in their to show, how to crack the VM UUID and to show, how-to extract related items which are exposed as separate resources in ARM attached to the VM.

Let’s start with first things first: the script requires the Azure Cross Platform CLI installed. On a newly provisioned Azure VM, that’s not there. So the script starts with installing stuff:

sudo mkdir /home/metadata
export HOME=/home/metadata

#
# Install the pre-requisites using apt-get
#

sudo apt-get -y update
sudo apt-get -y install build-essential
sudo apt-get -y install jq

curl -sL https://deb.nodesource.com/setup_4.x | sudo -E bash -
sudo apt-get -y install nodejs

sudo npm install -g azure-cli

Important Note: Since the script will run as a Custom Script extension, it does not have things like a user HOME directory set. To make NodeJS and NPM work, we need a Home-Directory. Therefore I set the HOME to /home/metadata to which I also save all the metadata JSON responses during the script.

The next hard thing was cracking the VM Unqiue ID. This Unique ID is available for some time in Azure and it identifiers a Virtual Machine for its entire lifetime in Azure. That ID changes when you take the VM off from Azure or delete it and re-create it. But as long as you just provision/de-provision or start/shutdown/start the VM, this ID remains the same.

But, the key question is whether you can use that ID to find a VM using ARM REST APIs to read metadata about itself, or even change its settings through Azure Resource Manager REST APIs. Obviously, the answer is yes, otherwise I would not write this post:). But the VM ID presented in responses from Azure Resource Manager REST APIs is different from what you get when reading it inside of the VM out of its asset tags – due to Big Endian bit ordering differences, also documented here.

So in my Bash-script for reading the metadata, I had to convert the VM ID before trying to use it to find my VM through the ARM REST APIs as follows:

#
# Read the VMID from the BIOS asset tag (skip the prefix, i.e. the first 6 characters)
#
vmIdLine=$(sudo dmidecode | grep UUID)
echo "---- VMID ----"
echo $vmIdLine
vmId=${vmIdLine:6:37}
echo "---- VMID ----"
echo $vmId

#
# Now switch the order due to encoding differences between the Windows and Linux World
#
vmIdCorrectParts=${vmId:20}
vmIdPart1=${vmId:0:9}
vmIdPart2=${vmId:10:4}
vmIdPart3=${vmId:15:4}
vmId=${vmIdPart1:7:2}${vmIdPart1:5:2}${vmIdPart1:3:2}${vmIdPart1:1:2}-${vmIdPart2:2:2}${vmIdPart2:0:2}-${vmIdPart3:2:2}${vmIdPart3:0:2}-$vmIdCorrectParts
vmId=${vmId,,}
echo "---- VMID fixed ----"
echo $vmId

That did the trick to get a VM ID which I can use to find my VM through ARM REST APIs, or through the Azure CLI since I am using bash-scripts here:

#
# Login, and don't forget to turn off telemetry to avoid user prompts in an automation script.
#
azure telemetry --disable
azure config mode arm
azure login --username "$appId" --service-principal --tenant "$tenantId" --password "$pwd"

#
# Get the details for the VM and save it
#
vmJson=$(azure vm list --json | jq --arg pVmId "$vmId" 'map(select(.vmId == $pVmId))')
echo $vmJson > /home/metadata/vmmetadatalist.json
echo "---- VM JSON ----"
echo $vmJson

What you see above is, that there’s today (as of August 2016) no way to query Azure Resource Manager REST APIs by using the VM Unique ID. Only attributes such as resource group and VM name can be used. Of course that applies to the Azure CLI, as well. Therefore I retrieve a list of VMs and filter it down using JQ by the VM ID… which fortunately is delivered as an attribute in the JSON response from the ARM REST APIs.

Now we have our first metadata asset, a simple list entry for the VM in which we are runnign with basic attributes. But what if you need more details. The obvious way is to execute an azure vm show --json command to get the full VM-JSON. But even that will not include all details. E.g. lets say you need the public or the private IP address assigned to the VM. What you need to do then is, navigating through relationships between those Azure Resource Manager Assets (the VM and the Network Interface Card resource, in specific). That is where it gets a bit tricky:

#
# Get the detailed VM JSON with relationship attributes (e.g. the NIC identified through its unique Resource ID)
#
vmResGroup=$(echo $vmJson | jq -r '.[0].resourceGroupName')
vmName=$(echo $vmJson | jq -r '.[0].name')
vmDetailedJson=$(azure vm show --json -n "$vmName" -g "$vmResGroup")
echo $vmDetailedJson > /home/metadata/vmmetadatadetails.json

#
# Then get the NIC for the VM through ARM / Azure CLI
#
vmNetworkResourceName=$(echo $vmJson | jq -r '.[0].networkProfile.networkInterfaces[0].id')
netJson=$(azure network nic list -g $vmResGroup --json | jq --arg pVmNetResName "$vmNetworkResourceName" '.[] | select(.id == $pVmNetResName)')
echo $netJson > /home/metadata/vmnetworkdetails.json

#
# The private IP is contained in the previously received NIC config (netJson)
#
netIpConfigsForVm=$(echo $netJson | jq '{ "ipCfgs": .ipConfigurations }')
echo $netIpConfigsForVm > /home/metadata/vmipconfigs.json

#
# But the public IP is a separate resource in ARM, so you need to navigate and execute a further call
#
netIpPublicResourceName=$(echo $netJson | jq -r '.ipConfigurations[0].publicIPAddress.id')
netIpPublicJson=$(azure network public-ip list -g $vmResGroup  --json | jq --arg ipid $netIpPublicResourceName '.[] | select(.id == $ipid)')
echo $netIpPublicJson > /home/metadata/vmipconfigspublicip.json

This should give you enough of the needed concepts to get all sorts of VM Metadata for your own VM using Bash-scripting. If you want to translate this to your Java, .NET, NodeJS or whatsoever code, then you need to look at the management libraries for the respective runtimes/languages.

Step #3: Putting it all together – the ARM template

Finally we need to put this all together! That happens in an ARM template and the parameters this ARM template requests from the user to be entered on provisioning. An ARM-template similar to this could be built for a solution template based Marketplace Offer.

On my GitHub repository for this prototype, the ARM template and its parameters are baked into the files azuredeploy.json and azuredeploy.parameters.json. I won’t go through all details of these templates. The most important aspects are in the parameters-section and in the VM creation section where I hook up the Service Principal with the Script and attach it as a Custom Script Extension. Start with an excerpt of the “parameters”-section of the template:

"parameters": {
    "storageAccountName": {
      "type": "string"
    },
    "dnsNameForPublicIP": {
      "type": "string"
    },
    "adminUserName": {
      "type": "string"
    },
    "adminPassword": {
      "type": "securestring"
    },
    "azureAdTenantId": {
      "type": "string"
    },
    "azureAdAppId": {
      "type": "string"
    },
    "azureAdAppSecret": {
      "type": "securestring"
    },
    ...
  },
...

The important parameters are the azureAdTenantId, azureAdAppId and azureAdAppSecret parameters. Those together form the sign-in details for the Service Principal as it is used in the script described in the previous section to read out the metadata for the VM on provisioning, automatically.

Reading the metadata is initiated through specifying my readmeta.sh-script as a custom script extension for the VM in the ARM template as below:

...
    {
      "type": "Microsoft.Compute/virtualMachines/extensions",
      "name": "[concat(parameters('vmName'),'/writemetadatajson')]",
      "apiVersion": "2015-06-15",
      "location": "[parameters('location')]",
      "dependsOn": [
        "[concat('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"
      ],
      "properties": {
        "publisher": "Microsoft.OSTCExtensions",
        "type": "CustomScriptForLinux",
        "typeHandlerVersion": "1.5",
        "settings": {
          "fileUris": [
            "[concat('https://', parameters('storageAccountName'), '.blob.core.windows.net/customscript/readmeta.sh')]"
          ]
        },
        "protectedSettings": {
          "commandToExecute": "[concat('bash readmeta.sh ', parameters('azureAdTenantId'), ' ', parameters('azureAdAppId'), ' ', parameters('azureAdAppSecret'))]"
        }
      }
    }
...

Since the Azure Linux Custom Script extension prints a lot of diagnostics details about what it is doing, we need to at least make sure that our sensitive data, especially the Service Principal’s password is NOT included in that diagnostics logs to keep it protected (well… as good as possible:)). Therefore the commandToExecute-setting is put into the protectedSettings-section which is NOT disclosed in any diagnostics-logs from the Custom Script Extension.

Important Note: On the Azure Quickstarts Template Gallery are many templates that are using the custom script extension version 1.2. For having the commandToExecute-setting in the protectedSettings-section, you have to use a newer version. For me, the latest version 1.5 at the time of writing the post worked. With the previous versions it just didn’t call the script.

Step #4: Trying it out…

Before you can try things out, there’s one thing you need to prepare: create the storage account and upload the readmeta.sh-script into that account (argh, next time I just write the scripts to clone my GitHub-repository:)). To make it easy, I created a script called deploy.sh with 10 parameters that does everything:

  1. Create the Resource group
  2. Create the storage account
  3. Upload the script to the storage account
  4. Update the parameters in azuredeploy.parameters.json to reflect your service principal attributes
  5. Start the deployment with the template and the updated template parameters.

And while trying I thought the 10 parameters make it flexible, but it’s still a hard start if you’d love to just quickly try this. So I created another bash-script called getstarted.sh. That asks you for all the data interactively and calls the createsp.sh and deploy.sh scripts based on the input you interactively entered. Just like below:

Getting Started

Final Words

With this in place, you have a solution that allows you to do both, reading instance metadata of the VM in which your software runs and also (with the right permissions set on the Service Principal) modify aspects of the VM through Azure Resource Manager APIs or Command Line Interfaces.

Sure, this reads like a complex, long thing. It would be much easer for Instance Metadata if you could do it without authentication and Service Principals. All I can say is that this will change and will become easier. But for now, that’s a solution and I hope I provide you with valuable assets that make the story less complex for you to achieve this goal!

And even when we have a simpler solution for instance metadata available in Azure, the content above shows you some advanced scripting concepts of which I hope you can learn from. The coolest thing of it: since Windows 10 Anniversary Update you can run all of the above on both, Windows and Ubuntu Linux, BECAUSE all is written as Bash scripts.

For me the nice side-effect of this was experiencing, how mature the Linux Subsystem for Windows seems to be. What really surprised me is, that I even can run Node Version Manager and build-essential on it (I even tried compiling v5 of my Node.JS version using it and it ran through and works).

Anyways – if you have any questions, reach out to me on Twitter.

Detecting if a Virtual Machine Runs in Azure – Part 2 – Updates for Linux VMs

A few months ago I did blog-post about how-to detect whether a virtual machine runs in Azure or not. This is vital for many independent software vendors who are planning to offer their own software through the Azure Marketplace for Virtual Machines.

The main detection strategy (Windows, Ubuntu)

In the post I did explain a few tricks on detecting whether the VM runs in Azure or not for both, Windows and Linux. Still the most reliable check known as of today is to check if the DHCP option “unknown-245” is set for in the DHCP-lease options for
a virtual machine.

  • Ubuntu Linux: I’ve posted a bash script in my previous blog. I generally stated that this works for Linux all up without considering that other Linux distributions might have different configuration files for storing DHCP lease details. Hence the following script works on Ubuntu-Linux based flavors, only:
      if `grep -q unknown-245 /var/lib/dhcp/dhclient.eth0.leases`; then
          echo “Running in an Azure VM”
      fi
    

Detecting if a CentOS VM runs on Azure

My peer and colleague Arsen Vladimirskiy pointed out that on CentOS the file for DHCP leases is stored on a different location. Hence the detection strategy for the DHCP-lease option I’ve explained in my original post does not work in CentOS-based virtual machines.

For CentOS based virtual machines the DHCP lease options are indeed stored in the path /var/lib/dhclient/dhclient.leases (or in case of multiple network interfaces dhclient-eth0.leases whereas the part eth0 needs to be replaced with the networking interface device you’re going to check against).

Therefore in a default configuration with just one ethernet adapter the script needs to be updated as follows to work inside of a CentOS virtual machine:

# manually start dhclient (seems to be a workaround)
dhclient

# then check against the lease files
if `grep -q unknown-245 /var/lib/dhclient/dhclient.leases`; then
   echo "Running in Azure VM"
fi

Note: There was one weird issue I ran into when trying the approach above, hence the script starts with launching dhclient. On a fresh deployed CentOS 7 VM in Azure from the marketplace stock image dhclient is not started by default. Therefore files such as dhclient.leases or dhclient-*.leases do not exist by default under /var/lib/dhclient/.

Only after manually executing the command sudo dhclient for starting the DHCP-client the files where created successfully and the check works. Well, now someone could think that this might be related to static IP addresses – but in Azure that’s not correct since IP addresses are always assigned by the Azure DHCP server. In case you want to have static IPs you configure those through the Azure Portal or Management APIs so that the Azure DHCP server always assigns the same, static IP address to the VM in the private, virtual network. So that cannot be the reason.

A more Complete Story for detecting DHCP unknown-245 in Linux

Well, now the distributions above are very common ones but are by var not all of the supported ones on Azure. The source code for the Azure Linux Agent contains all the secrets currently valid. If you really want to be on the save side across multiple Linux distributions. A few hints in the Python-based source code are:

  • Line 99-100 do show the directories you should consider for your detection strategy
      VarLibDhcpDirectories = 
         ["/var/lib/dhclient", "/var/lib/dhcpcd", "/var/lib/dhcp"]
      EtcDhcpClientConfFiles = 
         ["/etc/dhcp/dhclient.conf", "/etc/dhcp3/dhclient.conf"]
    
  • Further down in the code starting at line 5107 there is a section that makes use of option 245 as well:
      # ... other code before
      elif option == 3 or option == 245:
          # ...
      else:
          # ...
      # ... more code goes here
    

This code has been updated to version 2.0.15 24 days before writing/publishing this post. So it should still be safe to leverage option 245 for your detection strategy. As soon as there’s something better available, I’ll definitely post another update for this blog-post!

Final Disclaimer

The approaches outlined above did work on both, Ubuntu and CentOS 7 based VMs in Resource Manager based deployments (using the new ARM-template approach introduced by the Azure teams earlier this year) at the time of publishing this post (2015-09) during my tests. When I published the original post I did test them with classic service management based VMs, of course.

Therefore and as there is still no better way introduced at the time of publishing this post, yet, the options outlined in this and my original post are still valid and eventually the best you can get so far for detecting if your VM runs inside of Microsoft Azure or not…

If you found better options don’t hesitate to contact me via my twitter feed

Azure VMs – SQL Server AlwaysOn Setup across multiple Data Centers fully automated (Classic Service Management)

Last December I started working with two of my peers, Max Knor and Igor Pagliai, with a partner in Madrid on implementing a Cross-Data Center SQL Server AlwaysOn availability group setup for a financial services solution which is supposed to be provided to 1000s of banks across the world running in Azure. Igor posted about our setup experience which we partially automated with Azure PowerShell and Windows PowerShell – see here.

At the moment the partner’s software still requires SQL Server in VMs as opposed to Azure SQL Databases because of some legacy functions they use from full SQL Server – therefore this decision.

One of the bold goals was to fully enable the partner and their customers to embrace DevOps and continuous delivery across multiple environments. For this purpose we wanted to FULLY AUTOMATE the setup of their application together with an entire cross-data-center SQL Server AlwaysOn environment as outlined in the following picture:

In December we did a one-week hackfest to start these efforts. We successfully did setup the environment, but partially automated, only. Over the past weeks we went through the final effort to fully automate the process. I’ve published the result on my github repository here:

Deployment Scripts Sample Published on my GitHub Repository

Note: Not Azure Resource Groups, yet

Since Azure Resource Manager v2 which would allow us to dramatically improve the performance and reduce the complexity of the basic Azure VM environment setup is still in Preview, we were forced to use traditional Azure Service Management.

But about 50%-60% of the efforts we have done are re-usable given the way we built up the scripts. E.g. all the database setup and custom service account setup which is primarily built on-top of Azure Custom Script VM Extensions can be re-used after the basic VM setup is completed. We are planning to create a next version of the scripts that does the fundamental setup using Azure Resource Groups because we clearly see the advantages.

Basic Architecture of the Scripts

Essentially the scripts are structured into the following main parts which you would need to touch if you want to leverage them or understand them for learning purposes as shown below:

  • Prep-ProvisionMachine.ps1 (prepare deployment machine)
    A basic script you should execute on a machine before starting first automated deployments. It installs certificates for encrypting passwords used as parameters to Custom Script VM Extensions as well as copying the basic PowerShell modules into the local PowerShell module directories so they can be found.
  • Main-ProvisionConfig.psd1 (primary configuration)
    A nice little trick by Max which is nice to provide at least some sort of declarative configuration was to build a separate script file that creates an object-tree with all the configuration data typically used for building up the cluster. It contains cluster configuration settings, node configuration settings and default subscription selection data.
  • Main-ProvisionCrossRegionAlwaysOn.ps1 (main script for automation)
    This is the main deployment script. It performs all the actions to setup the entire cross-region cluster including the following setups:
    • Setup your subscription if requested
    • Setup storage accounts if they do not exist, yet
    • Upload scripts required for setup inside of the VMs to storage
    • Setup cloud services if requested
    • Create Virtual Networks in both regions (Primary/Secondary)
    • Connect the Virtual Networks by creating VPN Gateways
    • Set the primary AD Forest VM and the Forest inside of the VM
    • Setup secondary AD DC VMs including installing AD
    • Provision SQL Server VMs
    • Setup the Internal Load Balancer for the AlwaysOn Listener
    • Configure all SQL VMs to have AlwaysOn enabled
    • Configure the Primary AlwaysOn node with the initial database setup
    • Join secondary AlwaysOn nodes and restore databases for sync
    • Configure a file-share based witness in the cluster
  • VmSetupScripts Folder
    This is essentially a folder with a series of PowerShell scripts that do perform single installation/configuration steps inside of the Virtual Machines. They are downloaded with a Custom Script VM Extension into the Virtual Machines and executed through VM Extensions, as well.

Executing the Script and Looking at the Results

Before executing the main command make sure to execute .\Prep-ProvisionMachine.ps1 to setup certificates or import the default certificate which I provide as part of the sample. If you plan to seriously use those scripts, please create your own certificate. Prep-ProvisionMachine.ps1 provides you with that capability assuming you have makecert.exe somewhere on your machines installed (please check Util-CertsPasswords for the paths in which I look for makecert.exe).

# To install a new certificate
.\Prep-ProvisionMachine.ps1

# To install a new certificate (overwriting existing ones with same Subject Names)
.\Prep-ProvisionMachine.ps1 -overwriteExistingCerts

# Or to install the sample certificate I deliver as part of the sample:
.\Prep-ProvisionMachine.ps1 -importDefaultCertificate

Then everything should be fine to execute the main script. If you don’t specify the certificate-related parameters as shown below I assume you use my sample default certificate I include in the repository to encrypt secrets pushed into VM Custom Script Extensions.

# Enter the Domain Admin Credentials
$domainCreds = Get-Credential

# Perform the main provisioning

.\Main-ProvisionCrossRegionAlwaysOn.ps1 -SetupNetwork -SetupADDCForest -SetupSecondaryADDCs -SetupSQLVMs -SetupSQLAG -UploadSetupScripts -ServiceName "mszsqlagustest" -StorageAccountNamePrimaryRegion "mszsqlagusprim" -StorageAccountNameSecondaryRegion "mszsqlagussec" -RegionPrimary "East US" -RegionSecondary "East US 2" -DomainAdminCreds $domainCreds -DomainName "msztest.local" -DomainNameShort "msztest" -Verbose

After executing a main script command such as the following, you will get 5 VMs in the primary region and 2 VMs in the secondary region acting as a manual failover. 

The following image shows several aspects in action such as the failover cluster resources which are part of the AlwaysOn availability group as well as SQL Server Management Studio accessing the AlwaysOn Availability Group Listener as well as SQL Nodes, directly. Click on the image to enlarge it and see all details.

Please note that the failover in the secondary region needs to happen MANUALLY by executing either a planned manual failover or a forced manual failover as documented on MSDN. Failover in the primary region (from the first to the second SQL Server) is configured to happen automatically.

In addition on Azure it means to take the IP cluster resource for the secondary region online which by default is offline in the cluster setup as you can see on the previous image.

Customizing the Parts you Should Customize

As you can see in the image above, the script creates sample databases which it sets up for the AlwaysOn Availability Group to be synchronized across two nodes in the main. This happens based on *.sql scripts you can add to your configuration. To customize the SQL Scripts and Databases affected, you need to perform the following steps:

  • Create *.sql scripts with T-SQL code that creates the databases you want to create as part of your AlwaysOn Availability Group.
  • Copy the *.sql Files into the VmSetupScripts directory BEFORE starting the execution of the main script. That leads to have them included into the package that gets pushed to the SQL Server VMs
  • Open up the main configuration file and customize the database list based on the databases created with your SQL scripts as well as the list of SQL Scripts that should be pushed into osql.exe/sqlcmd.exe as part of the setup process for creating the databases.
  • Also don’t forget to customize the subscription name if you plan to not override it through the script-parameters (as it happens with the example above).

The following image shows those configuration settings highlighted (in our newly released Visual Studio Code editor which also has basic support for PowerShell):


Fundamental Challenges

The main script can primarily be seen as a PowerShell workflow (we didn’t have the time to really implement it as a Workflow, but that would be a logical next step after applying Azure Resource Groups).

It creates one set of Azure VMs after another and joins them to the virtual networks it has created before. It then executes scripts on the Virtual Machines locally which are doing the setup by using Azure VM Custom Script Extensions. Although custom script extensions are cool, you have two main challenges with them for which the overall package I published provides re-usable solutions:

  • Passing “Secrets” as Parameters to VM Custom Script Extensions such as passwords or storage account keys in a more secure way as opposed to clear-text.
  • Running Scripts under a Domain User Account as part of Custom Script Extensions that require full process level access to the target VMs and Domains (which means PowerShell Remoting does not work in most cases even with CredSSP enabled … such as for Cluster setups).

For these two purposes the overall script package ships with some additional PowerShell Modules I have written, e.g. based on a blog-post from my colleague Haishi Bai here.

Running Azure VM Custom Script Extensions under a different User

Util-PowerShellRunAs.psm1 includes a function called Invoke-PoSHRunAs which allows you to run a target script with its parameters under a different user account as part of a custom script VM Extension. A basic invocation of that script looks as follows:

$scriptName = [System.IO.Path]::Combine($scriptsBaseDirectory, "Sql-Basic01-SqlBasic.ps1") 
Write-Host "Calling into $scriptName"
try {
    $arguments = "-domainNameShort $domainNameShort " + `
                 "-domainNameLong $domainNameLong " +  `
                 "-domainAdminUser $usrDom " +  `
                 "-dataDriveLetter $dataDriveLetter " +  `
                 "-dataDirectoryName $dataDirectoryName " +  `
                 "-logDirectoryName $logDirectoryName " +  `
                 "-backupDirectoryName $backupDirectoryName " 
    Invoke-PoSHRunAs -FileName $scriptName -Arguments $arguments -Credential $credsLocal -Verbose:($IsVerbosePresent) -LogPath ".\LogFiles" -NeedsToRunAsProcess
} catch {
    Write-Error $_.Exception.Message
    Write-Error $_.Exception.ItemName
    Write-Error ("Failed executing script " + $scriptName + "! Stopping Execution!")
    Exit
}

This function allows you to either run through PowerShell remoting or in a separate process. Many setup steps of the environment we setup do actually not work through PowerShell remoting because they rely on impersonation/delegation or do PowerShell Remoting on their own which imposes several limitations.

Therefore the second option this script provides is executing as a full-blown process. Since Custom Script Extensions to run as local system, it is nevertheless not as simple as just doing a Start-Process with credentials being passed in (or a System.Diagnostics.Process.Start() with different credentials). Local System does not have those permissions, unfortunately. So the work-around is to use the Windows Task Scheduler. For such cases the function performs the following actions:

  • Schedule a task in the Windows Task Scheduler with the credentials needed to run the process as.
  • Manually start the task using PowerShell cmdLets
    • (Start-ScheduledTask -TaskName $taskName)
  • Wait for the task to be finished from running
  • Look at the exit code
  • Throw an Exception if the exit code is non-zero, otherwise assume success
  • Delete the task again from the task scheduler

This “work-around” helped us to completely execute the entire setup steps successfully. We were also discussing with the engineers building the SQL AlwaysOn single-data-center Azure Resource Group template that is available for single-data-center deployments in the new Azure Portal, today. They are indeed doing the same thing, details are just a bit different.

Encrypting Secrets Passed to Custom Script VM Extensions

Sometimes we were just required to pass secret information to custom script extensions such as storage account keys. Since Azure VM Custom Script Extensions are logged very verbose, it would be a piece of cake to get to that secret information by doing a Get-AzureVM and looking at the ResourceExtensionStatusList member which contains the status and detailed call information for all VM Extensions.

Therefore we wanted to encrypt secrets as they are passed to Azure VM Extensions. The basic (yet not perfect) approach works based on some guidance from a blog post from Haishi Bai as mentioned earlier.

I’ve essentially written another PowerShell module (Util-CertsPasswords) which can perform the following actions:

  • Create a self-signed certificate as per guidance on MSDN for Azure.
  • Encrypt Passwords using such a certificate and return a base64-encoded, encrypted version.
  • Decrypt Passwords using such a certificate and return the clear-text password.

In our overall workflow all secrets including passwords and storage account keys which are passed to VM Custom Script Extensions as parameters are passed as encrypted values using this module.

Using Azure CmdLets we make sure that the certificates are published with the VM as part of our main provisioning script as per Michael Washams guidance from the Azure Product group.

Every script that gets executed as part of a custom VM Script Extension receives an encrypted password and uses the module I’ve written to decrypt it and use it for the remaining script such as follows:

#
# Import the module that allows running PowerShell scripts easily as different user
#
Import-Module .\Util-PowerShellRunAs.psm1 -Force
Import-Module .\Util-CertsPasswords.psm1 -Force

#
# Decrypt encrypted passwords using the passed certificate
#
Write-Verbose "Decrypting Password with Password Utility Module..."
$localAdminPwd = Get-DecryptedPassword -certName $certNamePwdEnc -encryptedBase64Password $localAdminPwdEnc 
$domainAdminPwd = Get-DecryptedPassword -certName $certNamePwdEnc -encryptedBase64Password $domainAdminPwdEnc 
Write-Verbose "Successfully decrypted VM Extension passed password"

The main provisioning script encrypts the passwords and secrets using that very same module before being passed into VM Custom Script Extensions as follows:

$vmExtParamStorageAccountKeyEnc = `
Get-EncryptedPassword -certName $certNameForPwdEncryption `             -passwordToEncrypt ($StorageAccountPrimaryRegionKey.Primary)

That way we at least make sure that no un-encrypted secret is visible in the Azure VM Custom Script Extension logs that can easily be retrieved with the Azure Service Management API PowerShell CmdLets.

Final Words and More…

As I said, there are lots of other re-usable parts in the package I’ve just published on my Github Repository which even can be used to apply further setup and configuration steps on VM environments which have entirely been provisioned with Azure Resource Groups and Azure Resource Manager. A few examples:

  • Execute additional Custom Script VM Extensions on running VMs.
  • Wait for Custom Script VM Extensions to complete on running VMs.
  • A ready-to-use PowerShell function that makes it easier to Remote PowerShell into provisioned VMs.

We also make use of an AzureNetworking PowerShell module published on the Technet Gallery. But note that we also made some bug-fixes in that module (such as dealing with “totally empty VNET configuration XML files”).

Generally the experience of building these ~2500 lines of PowerShell code was super-hard but a great learning experience. I am really keen to publish the follow-up post on this that demonstrates how much easier Azure Resource Group templates to make such a complex setup.

Also I do hope that we will have such a multi-data-center template in the default gallery soon since it is highly valuable for all partners and customers that do need to provide high-availability across multiple data centers using SQL Server Virtual Machines. In the meantime we will try to provide a sample based on this work above as soon as we can have time/resources for implementation.

Finally – thanks to Max Knor and Igor Pagliai – without their help we would not have achieved these goals at this level of completeness!

Detecting if a Virtual Machine Runs in Microsoft Azure (Linux & Windows) to Protect your Software when distributed via the Azure Marketplace

Our team started working more and more with software vendors we categorize as “Enablers” at a global scale. Such companies are providing building block services which can be used to build finished software services that do run in the cloud (or on-premises).

For such “Enablers” the Azure Marketplace is a key-instrument to gain visibility and traction as well as for instantiating their services in their customer’s Microsoft Azure Subscriptions.

At the moment most of the partners are working with us to deploy offerings based on templates with single or multiple Virtual Machines that do run their software. Later down the path we will also enable on-boarding of “Application Services” where customers do not have to instantiate and manage Virtual Machines, anymore.

One of the main challenges our partners do face when putting their software into Virtual Machine templates which can be instantiated and/or purchased through the Azure Marketplace is protecting their software from being operated outside of Azure since this would enable malicious people to operate the software without charging for it.

Customers have full control to VMs provisioned via the Marketplace

Since when end-customers create Virtual Machines via the Azure Marketplace they have full control of the resulting, instantiated VMs after they provisioned them, many of our partners start asking the following obvious question: How can I detect if a Virtual Machine runs in Azure so that my software can block itself from being started when not running in Azure?

Unfortunately as of today there’s no good and simple answer to that. There are various approaches out there which I would like to summarize below. I think the best possible way as of today (April 2015) is a combination of all of these approaches to make it as hard as possible running your software outside of an Azure VM.

Query for DHCP Option 245

The first option is one that originally came up by a fellow peer from our Azure support engineering team. It has been provided for Windows Virtual Machines as a PowerShell script and essentially performs the following two actions:

  1. Check if the VMBus driver from Hyper-V is active.
  2. If so, check the DHCP lease attributes for option “unknown-245”

The option “unknown-245” is an Azure-proprietary option which only gets issued by an Azure DHCP server. Since in Azure you always will get an address via DHCP (static IPs are also managed by the DHCP and with the REST management API) you will always (and in theory only) get this option as part of the DHCP lease attributes when your machine runs in Azure.

For Windows there is a ready-made PowerShell CmdLet that allows you to detect if a VM runs in Azure: https://gallery.technet.microsoft.com/scriptcenter/Detect-Windows-Azure-aed06d51

For Linux you can create a bash-script such as the following one to detect if the option unknown-245 is available to have a first indicator of whether you run in Azure or not:

if `grep -q unknown-245 /var/lib/dhcp/dhclient.eth0.leases`; then
    echo “Running in an Azure VM”
fi

This is currently considered to be the most used and simplest approach to detect if you’re running on Azure that is “good enough”. But for some partners it is understandably not enough, yet…

Use the Azure Agent as Detection-Strategy

On Linux VMs in specific, another approach is reading the configuration from the Microsoft Azure Agent which is always installed on a Linux VM and try to reach it’s ping-counterpart on the host-agent side. If a VM does not run in Azure, trying to reach the host-agent end-point would always result in a timeout. Here’s a sample script for doing so:

curl –connect-timeout 1 `grep FullConfig /var/lib/waagent/GoalState.1.xml | perl -pe ‘s/<.?FullConfig>//g; s/\s//g’` && echo azure || echo no-azure

On Windows VMs there’s only an agent available when you explicitly select the VM Agent for VM Extensions to be installed. Some partners are checking if that agent is available and explicitly document for their customers that they MUST install the VM Agent when provisioning a Marketplace Image from the Azure Marketplace for their software to work correctly.

Checking your external IP Address

If you Virtual Machine has a public endpoint attached, you can also verify the public IP address your VM is using when trying to access other services and compare it against the IP address ranges that are reserved for Azure data centers.

The Azure data center IP address ranges can be downloaded from the Microsoft Download Center here: http://www.microsoft.com/en-us/download/details.aspx?id=41653

Valuable services to get your publicly visible IP address are services such as the http://ifconfig.me/ip which can also be used in a PowerShell script or bash-script easily:

function Get-ExternalIP {
    (Invoke-WebRequest ifconfig.me/ip).Content
}
Get-ExternalIP

A more complex script can then even automatically download the Azure IP ranges from the Microsoft Download center via the direct URL (which are stored as XML) and try to check if the ranges match.

Leveraging the Azure REST Management API or CLI-interfaces

Finally you could also ship your VM with either Azure PowerShell CmdLets or the Azure Cross-Platform CLI shipped and query details about your VM through the rest management API.

But – that has one big step you need to take upfront: this requires you to force the user somehow to provide credentials or a management certificate that gives the VM access to the customer’s subscription in which the VM is deployed so that you can query the details about the VM (which belongs to the customer’s subscription and is owned by the customer… and not you as the provider/creator of the marketplace VM template and offering).

To get this done you need to do e.g. one of the following things:

  • Write a very explicit documentation for your customers that explains what they need to do after they provisioned the VM from the Azure Marketplace into their subscription before they can use your software in that VM or VMs.
  • Or e.g. write a little provisioning web application which is shipped as part of the VM image that the user needs to browse to immediately after provisioning the VM from the marketplace to enter the remaining details that enable your software or “bootstrapping-scripts” to use the Azure Service Management API or CLI to query additional information about your VM and use that for detecting if you run in Azure or not (e.g. query your public and internal IP and compare with what your VM reports etc.).Of course you need to make sure that this “provisioning-app” you need to build is only active in the provisioned instance after the initial creation from the marketplace to avoid any kind of security issues.

At some point in time in the near future, the Azure Marketplace service will enable publishers of images to require the user providing additional details through the Azure portal as part of the provisioning/creation-process, already. But as long as that’s not possible you need to look at approaches such as the ones I’ve outlined above.

Final Words

The approaches outlined above are all used by publishers of VM templates in the marketplace today and they work. I know they are not optimal, but we also know that the product group is aware of the challenges and will work on better solutions in the future. For now, the approaches I outlined above are easy and pragmatic ways that at least give you some level of guarantee for detecting of whether a VM and your software runs in Microsoft Azure (public cloud) or not…

Janet Moonshot & FreeRADIUS on Microsoft Azure – An important step for Researchers for being able to use Microsoft’s Public Cloud Platform

Over the past months I’ve spent some time working with Janet the UK’s National Research and Education Network. As well as managing the operation and development of the Janet network, Janet runs a number of services for educationand research including operating eduroam(UK), the UK part of a large, global network established between all sorts of research and education facilities and institutions.

In addition to eduroam and other services, one of the most important projects Janet is leading is the development and standardization of an open platform for authentication, authorization and trust management based on existing standards:

This platform is called Project Moonshot.

Simply put, Moonshot in conjunction with FreeRADIUS is an identity provider and a security token service. Nevertheless it is primarily based on the protocols mentioned above – EAP, RADIUS and GSS-API/SSPI/SASL instead of WS-Federation, OAuth or SAML-P (although one of the token formats supported by Moonshot are SAML tokens).

What I personally think is really cool about Moonshot!?

The really cool and practical thing by being built on the protocols above, is in my personal opinion, that those protocols are supported by almost all relevant operating system platforms deeply integrated since these standards are also used by Kerberos. For example Windows as an OS has SSPI deeply built into the logon-process of Windows which means with an SSPI-provider for Moonshot, the Windows logon itself can be sourced from a federation through a variety of trust-relationships instead of the OS or a direct domain controller by itself. That is one thing that is not possible with the commonly more widely known standards such as WS-Fed, OAuth or SAML-P since they’re all web-focused.

Why is Moonshot so important for Microsoft and Microsoft Azure?

Independent of what I think the advantages are of Moonshot, the most important part for Microsoft and Microsoft Azure is, that Janet is working with NRENs, academia and research across Europe and internationally to establish Moonshot as THE prime authentication mechanism for research communities, building up trust relationships between them and thus allowing federated authentication and authorization for research projects across the world.

In other words: if Microsoft Azure wants to play an important role in research in the future, Moonshot needs to be somehow supported in Azure as a platform. Through our partnership and work with Janet we achieved a first step for this over the past months, together!

Moonshot IdP Base Image on VMDepot…

Working together with Janet we managed to get a base-image prepared, tested and published on Microsoft Open Technologies VMDepot that can be used by anyone who wants to get connected to research communities through Moonshot Trust Routers and IdPs for federated security.

You can find this image here on VMDepot for getting started.

Although it seems like a simple thing to do, we had to undergo a few steps to get this far. Moonshot was required to be updated to support the Linux-distributions officially supported on Microsoft Azure. Furthermore we had to test if the semantics of the protocols used, especially EAP, do work well on Microsoft Azure. At least for single VM deployments we did this and the image above on VMDepot contains all the bits with which we’ve tested.

Of course that’s just the first step and we know we need to take some future steps such as making the deployments ready for multi-instance deployments for the sake of high availability and eventually also performance. Nevertheless, this is a great first step which was required and enables us to move forward.

The image by itself is based on Ubuntu Linux 12.10 LTS, it has the Moonshot and FreeRADIUS package repositories configured correctly and has other useful packages installed that are required or nice to have for Moonshot and FreeRADIUS (such as “screen” for example).

Using the Moonshot/FreeRADIUS VMDepot Image

Next I’d like to summarize how you can make use of the Moonshot VMDepot image. Note that most probably you should be involved in academia or research projects for this to be useful to you J. Of course you can also setup your own, single IdP using the image, but the full power gets unleashed when you become part of the Janet Trust Router network which is what I am focusing on in this blog post right now.

Let’s start with a few assumptions / prerequisites:

  • Assumption #1:
    Since the primary target group is academia and research which is very Linux-focused, I am assuming people who’re trying this will most probably try the steps below from a Linux-machine. Therefore I am only using tools that also work on Linux (or Mac), although I am running them from a Windows machine.
  • Assumption #2:
    For the steps to complete you need an active Microsoft Azure subscription. To get one, navigate to http://azure.microsoft.com and click on the “Free Trial” button in the upper, right corner.
  • Assumption #3:
    You are able to get in touch with Janet to participate in their Moonshot pilot to get the credentials required to connect your Moonshot IdP/RP to their Trust Router Network and that way become part of the larger UK and global academia and research community implementing Moonshot.

Now let’s get started with the actual deployment of a Moonshot VM based on the image Janet and we have published together on VMDepot:

  • Install Node.js if on your machine if not done, yet.
    Node.js is needed since the Microsoft Azure Cross Platform Command Line Interface which we will use for setting up the Azure environment is built with Node.js.
  • Install the Azure Cross Platform Command Line Interface (xplat CLI).
    The Azure xplat CLI is a command line interface that allows you to script many management operations for services in your Azure subscription from either a Linux, Mac or also a Windows machine. For more details on setting it up, please refer to the Azure xplat CLI homepage.
  • Import your Subscription Publish Profile Settings through the xplat CLI.
    Before you an issue any operation to your Azure subscription in the cloud through the xplat CLI, you need to download and import a credentials file. This is the only operation that requires a GUI with a web browser, so if you issue the following command, you should sit on a machine with x-Windows installed or be on a Mac or Windows machine. Open a shell-window or a command prompt and execute the following command:
    • azure account download
      This will open a web browser and browse to a page where you’ll need to sign-in with the account that has access to your Azure subscription (your Microsoft account with which the subscription has been created or which is a Co-Admin of another subscription). It results in the download of a “xyz.publishsettings”-file which contains the credentials. Save that file to your local disk. Next execute the subsequent command:
    • azure account import <path & filename to xyz.publishsettings>
      This command finally makes the Azure xplat CLI aware of your credentials. After that step you can finally start with true management commands against your subscription.
    • Note: if you have multiple Azure subscriptions, you also need to select the subscription in which you want to create the VM using azure account set <subscription-id>
  • Create a VM image based on our VMDepot base image using the xplat CLI.
    Finally we can create the Virtual Machine based on the VMDepot image. For this purpose execute the following command in your previously opened shell:
    • azure vm create yourdnsprefix -o vmdepot-28998-1-16 -l “North Europe” yourusername yourpassword –ssh
    • This command creates a VM which will get a public DNS-name called “yourdnsprefix.cloudapp.net” through which you then can connect to your VM (e.g. via SSH).
    • The result of issuing the command should look similar to the following:
    • What you see here is that the script transfers the template for the virtual machine from VMDepot with the VMDepot image id 28998-1-16 to your storage account and then creates the VM from that template. Finally it does some clean-up stuff.
  • Make sure Moonshot/FreeRADIUS and SSH endpoints are open on the Azure firewall.
    The last step is to open up the required TCP-endpoints on the Azure firewall. This can happen after the VM has been created successfully. Ports required are 2083 and 12309 for Moonshot/FreeRADIUS, SSH is open by default on 22 given our previous command including the -ssh switch. Issue the following commands:
    • azure vm endpoint create-multiple DNS_PREFIX 2083:2083,12309:12309
    • The result should look similar to the following (note that I’ve added port 22 before, already, therefore you won’t see it in the screenshot):

After you’ve completed those steps and the VM has been created, successfully, you need to connect to the VM and perform the final Moonshot/FreeRADIUS-configuration steps. These are pretty much the same as those you’d need to do on an on-premise machine in your own data center, we’ve prepared anything in that image in a way that it should work smooth in Azure.

  • SSH into the newly created VM.
    Make sure you connect as root so you can perform all administrative tasks. All subsequent steps are to be executed in that SSH-session to your newly created VM!
  • Update to the latest package versions.
    Since the image is Ubuntu-based, use apt-* to update to the latest version of the packages. Issue the following commands:
  • Update the FreeRADIUS certificate files to match your organizational values.
    As part of the bootstrapping process, Moonshot and FreeRADIUS generate certificate files required for setting up trust relationships between your RP/IdP and other RP/IdPs. These are generated through openssl based on settings-files prepared in the bootstrap-image from VMDepot. You should customize those to match your organizational values, for example such as the common name to be used for your organization and IdP. Perform the following steps:
    • Switch to the directory /etc/freeradius/certs.
    • Open the file ca.cnf and update the following values to match your own values:
      • emailAddress
      • commonName

    It should look similar to the following if you’re using VI, if you’re really taking it serious, then also update the other values (e.g. passwords for private key files):

    • Open the file server.cnf and update the following values to match your own values:
      • emailAddress
      • commonName
      • It should look similar to the following if you’re using VI:
    • Finally update the same values also in the file client.cnf to match your own values:
    • Execute the command sudo /etc/freeradius/certs/bootstrap. This produces a lot of output, but at the end your screen will look similar to the following after executing this command:
  • Fine-tune Client Private Key Files:
    Next you need to “fine-tune” the private key files for the clients. This is supposed to be something that will be fixed/made easier in future versions of Moonshot and FreeRADIUS. Perform the following steps:
    • Change to the directory /etc/freeradius/certs.
    • Run the following command: cat client.crt client.key > client.txt
    • Now overwrite client.key with client.txt by executing the following command:
      mv client.both client.key
    • Open client.key in VI and delete all lines until the first —- BEGIN CERTIFICATE —- appearance as shown below:
  • Realm configuration – Part #1
    Now that all certificates are configured, you need to configure your “realm”-settings such as the name of your realm and other options Moonshot and FreeRADIUS allows you to set. For this blog-post we keep it with the simple creation of a realm for your setup:
    • Switch to the directory /etc/freeradius.
    • Open the file proxy.conf in VI and add the following section anywhere in the file:
      realm yourrealm.com
      {
      }
    • The realm you select should match the DNS-name you’re planning to use for setup. This DNS-name should then be mapped using a DNS CNAME-alias or DNS A-Record to your xyz.cloudapp.net setup in Azure.
    • You can look at the sample-realm configurations in the file so that you can decide which other options you’d prefer to set for your setup. For this post we keep things at a default-setup.
  • Realm configuration – Part #2:
    For the next realm-setting perform the following steps in the SSH-session:
    • Open the file /etc/freeradius/mods-enabled/realm in VI for editing.
    • Add the following section at any place in the file:
      realm suffix {
      change rp_realm = “yourserver.yourrealm.com”
      }
    • Make sure you use the same domain-name as before (e.g. yourrealm.com) and that the name you specify here (yourserver.yourrealm.com) is a resolvable DNS-name.
    • The results should look similar to the following:
  • Modify post-authentication step that issues the SAML assertion.
    Next you need to modify the post-authentication steps. One action that happens in those post-authentication steps is the definition of SAML-assertions that get issued as a token after a successful authentication. We prepared the image with a default-template that you can customize based on your need. But even if you don’t customize the assertions, there’s one step you need to complete and that’s bringing your realm into the context of the post-authentication steps.
    • Open the file /etc/freeradius/sites-enabled/default with VI.
    • Search for a configuration section starting with post auth { … }.
    • Modify it to issue the SAML-assertion for your realm as follows:
      post-auth {
          if (Realm == LOCAL) …
      change to
          if (Realm == “cloudapp.net”) (same as above)
      }
    • The result should look similar as the following:
  • Request and setup Trust Router Credentials (through Janet).
    As mentioned most use of a Moonshot/FreeRADIUS install is given when you connect it to a research community. For this purpose get in touch with Janet to join their pilot via https://www.ja.net/products-services/janet-futures/moonshotOnce accepted onto the pilot, Janet  will send you Trust Router credentials which allow you to get into a federation with the research network Janet operates.
    • Janet (or other Trust Router operators) will send you the trust router credentials for setting up the trust relationship as an XML file. Put that XML-file on your Moonshot VM created earlier.
    • Next execute the following commands (assuming the XML-file with the Trust Router credentials is called mytrustcreds.xml):
          su –shell /bin/bash freerad
          unset DISPLAY
          moonshot-webp -f mytrustcreds.xml
    • With those credentials your IdP/RP will be able to connect and federate with the Trust Router network Janet operates for academia and research (or the one you’ve received the credentials for).

Finally that’s it, we’ve completed all steps for configuring the Moonshot/FreeRADIUS setup. Now it’s up to test the environment or start using it for your single-sign-on and authentication purposes. A simple test for your environment together with Janet could look as follows:

  • Open up three terminal sessions to your Moonshot/FreeRADIUS VM you just created.
  • In Terminal #1 perform the following steps:
    • Open /etc/freeradius/users using VI.
    • Look for the following line: testuser Cleartext-Password := “testing”
    • Leave it for the test or modify it as per your needs. Also that’s where you could add your own users of your IdP. If you leave it as above that’s the credentials you can use for testing.
    • Now execute the following commands:
      • su –shell /bin/bash freerad (runs a shell under the FreeRADIUS user)
      • freeradius -fxx -l stdout (runs freeradius for debugging with logging to stdout)
  • In Terminal #2 perform the following steps:
    • moonshot-webp -f <path to previously received trust router credentials XML>
    • tids <your-external-ip> trustrouter@apc.moonshot.ja.net /var/tmp/keys
      • The external IP for your Azure VM is visible in the Azure Management portal (manage.windowsazure.com) for your virtual machine.
      • trustrouter@apc.moonshot.ja.net is an example for a trusted trust router. In case you federate with Janet, that’s most likely the one you’ll use.
  • In Terminal #3 perform the following step:
    • tidc tr1.moonshot.ja.net {your rp-realm} apc.moonshot.ja.net apc.moonshot.ja.net
  • Important Note: for the commands above to succeed, you need to have valid Janet Trust Router credentials and Janet needs to have your IdP/RP configured in their trust-settings as a trusted party! Otherwise later when executing the tidc-command the test will fail!
  • Finally to complete the test someone needs to use Moonshot and its identity selector on a client machine to authenticate using your IdP. The best way to do that is using the LiveDVD for Moonshot provided by Janet.

That’s it, now you have your Moonshot / FreeRADIUS IdP to get yourself connected with a huge community of researchers, scientists and students across the world… for further questions it’s best to get in touch with the people from Janet and Moonshot via moonshot-community@jiscmail.ac.uk. And go to the Moonshot home page to find more details here:

https://community.ja.net/groups/moonshot

https://www.ja.net/products-services/janet-futures/moonshot