Azure & Cloud Foundry – Setting up a Multi-Cloud Environment

This week I was presenting at the CloudFoundry Summit 2016 Europe in Frankfurt, of course about running CloudFoundry on Azure and Azure Stack. It was greate being here, especially because one of my two main Global ISV partners I am working with on the engineering side, have been here as well and are even a Gold-sponser of the event. It was indeed an honor and great pleasure for me to be part of this summit here … and great to finally have a technical session at a non-Microsoft conference, again:)

Indeed, one reason for that blog-post is because I ran out of time during my session and was able to show only small parts of the last demo.

Anyways, let’s get to the more techncial part of this blog-post. My session was all about running CF in Public, Private as well as Hybrid Clouds with Azure being involved in some way. This is highly relevant since most enterprises are driving a multi-cloud strategy of some way:

  • Either they are embracing Hybrid cloud and run deployments in the public cloud as well as in their own data centers for various reasons or
  • they want to distribute and minimize risk by running their solutions across two (or more) public cloud providers.

Despite the fact my session was focused on running Cloud Foundy on Azure, a lot of the concepts and architectural insights presented, can be re-used for other kinds of deployments with other cloud vendors or private clouds, as well.

The basics – Running Cloud Foundry on Azure and Pivotal

Microsoft has developed a Bosh CPI that enables bosh-based deployments of Cloud Foundry on Azure. The CPI is entirely developed as an Open Source Project and contributed to the Cloud Foundry Incubator project on GitHub.

Based on this CPI, there are two main ways for deploying deploying Cloud Foundry clusters on Microsoft Azure:

There’s a very detailed guidance on all of those GitHub repositories available that do explain all the details, I would suggest to follow this one since it is by far the easiest one: Deploy Cloud Foundry on Azure and always follow the via ARM templates suggestions of the docs.

Finally, in addition to Azure, to completly follow this post you need a 2nd CF cluster running in another cloud. The by far easiest way is to setup a trial account on Pivotal Cloud, which provides you with some sort of "CloudFoundry-as-a-Service". Follow these steps here for doing so…

A Multi-Cloud CF Architecture with Azure on one side

There are many reasons for multi-cloud environments. Some might include running parts in private clouds because of legal and compliance reasons while others including spreading risk across multiple cloud providers for disaster recovery reasons. The example in this post is focused exactly around the multi-cloud DR case since it covers two public cloud providers:

architecture

  • Azure Traffic Manager acts as a DNS-based load balancer. We will configure traffic manager with a Priority-Policy, which essentially leads traffic based on priority and if one cloud has a failure, Traffic Manager will route traffic to the other cloud.
  • The Azure Load Balancer is a component you get "for free" in Azure and don’t really need to take care off. It balances traffic across the front-nodes of your CF cluster and is automatically configured for you if you follow the guidance above for deploying CF on Azure.
  • Inside of each CF cluster, we need to make sure to register the DNS names used by Traffic Manager and configure the CF routers to route to our apps in the CF cluster, apropriately.

Setting up traffic manager

Let’s start with setting up the Azure Traffic Manager since we’ll need it’s domain name for the configuration of the apps in both Cloud Foundry targets. You can just add Azure Traffic Manager as a Resource to the Resource Group of your Cloud Foundry deployment or any other resource group. In my case, I deployed the Traffic Manager in another resource group as shown in the following screen shot:

Traffic Manager Setup

The important piece to take for now is the Domain Name of your traffic manager end-points. The actual end-points for traffic manager do not need to be configured at this point in time – we will look at it later.

Deploying the sample app to Pivotal Web Services

As a next step, we need to deploy the sample application to Pivotal web services and need to take note of the (probably random) domain name it has associated ot the application.

$pivotalApiEndpoint="api.run.pivotal.io"
cf login -a $pivotalApiEndpoint
cf target -o $pivotalOrg -s $pivotalSpace
cf push -f ./sampleapp/manifest.yml -p ./sampleapp
cf set-env multicloudapp REGION "Pivotal Cloud"
cf restage multicloudapp

To get the domain name and IP, just execute a cf app multicloudapp and take note of the domain name as shown in the following figure:

Pivotal App Domain Name

Deploying the App into Cloud Foundry on Azure

The deployment of the sample app into Azure goes exactly the same way, except that we’ll need to use different API end-points, organization names and spaces inside of Cloud Foundry:

$azureCfApiEndpoint="api.$azureCfPublicIp.xip.io"
cf login -a $azureCfApiEndpoint
cf target -o $azureOrg -s $azureSpace
cf push -f ./sampleapp/manifest.yml -p ./sampleapp
cf set-env multicloudapp REGION "Microsoft Azure"
cf restage multicloudapp

The Cloud Foundry API end-point I used above is the one that is registered by default when using the ARM-based deployment of open source Cloud Foundry with the Azure Quickstart Templates. The DNS-registration mechanism used there is documented here.

Also note the environment variables I am setting in the scripts above using cf set-env multicloudapp REGION "xyz". Indeed, that is used by our sample application (which is written with Ruby in this case) to output, in which region we are running the app. That way, we can see, if we are directed to the app deployed in Azure or in Pivotal Web Services.

Finally, if you’re new to Azure, the best way to find out the public IP which has been created for your CF cluster, is looking up a public IP address in the Azure Portal which has been created inside of the Resource Group for your Cloud Foundry cluster. Another way – if you are a Shell Scripter – would be to use the following command with the Azure Cross Platform CLI:

azure network public-ip show --resource-group YOUR-RESOURCE-GROUP YOUR-IP-NAME
info:    Executing command network public-ip show
+ Looking up the public ip "YOUR-IP-NAME"
data:    Id                              : /subscriptions/YOUR-SUBSCRIPTION-ID/resourceGroups/YOUR-RESOURCE-GROUP/providers/Microsoft.Network/publicIPAddresses/mszcfbasics-cf
data:    Name                            : YOUR-IP-NAME
data:    Type                            : Microsoft.Network/publicIPAddresses
data:    Location                        : northeurope
data:    Provisioning state              : Succeeded
data:    Allocation method               : Static
data:    IP version                      : IPv4
data:    Idle timeout in minutes         : 4
data:    IP Address                      : 52.169.87.212
data:    IP configuration id             : /subscriptions/YOUR-SUBSCRIPTION-ID/resourceGroups/marioszpCfSimple/providers/Microsoft.Network/networkInterfaces/SOME-ID/ipConfigurations/ipconfig1
data:    Domain name label               : marioszpcfsimple
data:    FQDN                            : marioszpcfsimple.northeurope.cloudapp.azure.com
info:    network public-ip show command OK

Configuring Traffic Manager Endpoints

Next, we need to tell Azure Traffic Manager the endpoints it should direct request which do approach on the DNS record registered with Traffic Manager to.

In our case, we use a simple Priority-based policy which means, Traffic Manager tries to always direct requests to an endpoint with the more important priority except that endpoint is not responsive. For a full documentation about policy routes, please refer to the Azure Traffic Manager docs.

Traffic Manager Endpoints

As you can see from the above, we have two endpoints:

  • Azure Endpoint which goes against the Public IP that the scripts and Bosh deployed for us when we deployed Cloud Foundry on Azure at the beginning.
  • External Endpoint which goes against the domain name for the app that Pivotal Web Services has registered for us (something like multicloudapp-xyz-abc.cfapps.io).

Let’s give it a try…

Now, in the previous configuration for Traffic Manager, we defined that the Pivotal Deployment has priority #1 and therefore will be preferred by Traffic Manager for Traffic routing. So, let’s open up a browser and navigate to the Traffic Manager DNS name for your deployment (in my screen shots and at my CF session that is marioszpcfsummithybrid.trafficmanager.net):

not working

Of course, a Cloud Foundry veteran spots immediately, what that means. I am not a veteran in that area, so I was falling into the trap…

Configuring Routes in Cloud Foundry

What I forgot when setting this up, originally, was configuring routes for the Traffic Manager Domain in my Cloud Foundry clusters. Otherwise, Cloud Foundry will reject requests coming in through that domain as it does not know about it.

We need to configure the routes on both ends to make it working, as shown below, we’re adding the traffic manager domain to the routes and ensure, CF routes traffic from those domains to our multi-cloud sample app:

$trafficMgrDomain=marioszpcfsummithybrid.trafficmanager.net

#
# First do this for Pivotal
#
cf login -a $pivotalApiEndpoint
cf target -o $pivotalOrg -s $pivotalSpace

cf create-domain $pivotalOrg $trafficMgrDomain
cf create-route $pivotalSpace $trafficMgrDomain
cf map-route multicloudapp $trafficMgrDomain

#
# Then do this for the CF Cluster on Azure
#
$azureCfApiEndpoint="api.$azureCfPublicIp.xip.io"
cf login -a $azureCfApiEndpoint
cf target -o $azureOrg -s $azureSpace

cf create-domain $azureOrg $trafficMgrDomain
cf create-route $azureSpace $trafficMgrDomain
cf map-route multicloudapp $trafficMgrDomain

Now let’s give it a try, again, and see what happens. This time we should see our Ruby sample app running and showing that it runs in Pivotal since we defined the priority for the Pivotal-based deployment within Azure Traffic Manager.
it works

Fixing Routes on Azure with Traffic Manager

After I indeed did the route mapping on Azure, Traffic Manager still claimed that the Azure-side of the house is Degraded, despite having the route configured. Initially, I didn’t understand why.

I didn’t have this problem when I initially tried this setup before. But when I initially tried this, I did not have assigned a DNS name to the Cloud Foundry Public IP in Azure. I’ve changed that because I tried something else in between and assigned a DNS name to the Azure Public IP for the CF Cluster. This lead traffic manager to route against that DNS name instead of the IP.

For troubleshooting that, I initated a fail-over and stopped the app on the Pivotal side (see next section) to make sure, Traffic Manager would try to route to Azure. A tracert finally told me, what was going on:

C:\code\github\mszcool\cfMultiCloudSample [master ≡]> tracert marioszpcfsummithybrid.trafficmanager.net

Tracing route to marioszpcfsimple.northeurope.cloudapp.azure.com [52.169.87.212]
over a maximum of 30 hops:

  1     5 ms     5 ms     4 ms  10.10.16.4
  2     2 ms     1 ms     1 ms  80.146.218.2
  3     2 ms     1 ms     2 ms  62.156.233.185
  4     5 ms     5 ms     5 ms  87.190.232.17
  5     8 ms     7 ms     7 ms  f-ed1-i.F.DE.NET.DTAG.DE [62.154.14.118]

When looking at the selected routes, we immediately spot, that the traffic manager domain gets resolved to the .cloudapp.net domain of the Azure Public IP. So my route on the CF-side of the house was just wrong. The route for Azure should not go against the traffic manager, but rather on the custom domain assigned to the cloud foundry cluster’s public IP in Azure:

cf map-route multicloudapp marioszpcfsimple.northeurope.cloudapp.azure.com

C:\code\github\mszcool\cfMultiCloudSample [master ≡]> cf routes
Getting routes for org default_organization / space dev as admin ...

space   host   domain                                            port   path   type   apps            service
dev            52.169.87.212
dev            marioszpcfsimple.northeurope.cloudapp.azure.com                        multicloudapp
dev            marioszpcfsummithybrid.trafficmanager.net                              multicloudapp

Testing a failover

Of course, we want to test if our failover strategy really works. For this purpose, we kill the App on the Pivotal-environment by executing the following commands:

cf login -a $pivotalApiEndpoint
cf target -o $pivotalOrg -s $pivotalSpace
cf stop multicloudapp

After that, we need to wait a while until traffic manager detects, that the application is not healthy. It then also might take a few seconds or minutes until the DNS record updates are propagated until we see the failover working (the smallest DNS TTL you can set, is 300s as of today).

So watch, what goes on, the simplest way is looking at the Azure Portal and opening up the Azure Traffic Manager configuration. At some point in time we should see, that one of the endpoints changes its status from Online to Degraded. When opening up a browser and trying to navigate to the traffic manager URL, we should no get redirected to the Azure-based deployment (which we see given our App is outputing the content of the environment variable we did set different for each of the deployments, before):

failover test

Final Words

I hope this gives you a nice start in setting up a Multi-Cloud Cloud Foundry environment across Azure and a 3rd-party cloud or your own data center. I will try to continue this conversation on my blog, for sure. There are tons of other cool things to explore with Cloud Foundry in relationship to Azure, and I’ll at least try to cover some of those. Let me know what you think by contacting me through twitter.com/mszcool!

As usual – all the code is available on my GitHub in the following repository:

https://github.com/mszcool/cfMultiCloudSample

Azure Virtual Machines – A Solution for Instance Metadata in Linux (and Windows) VMs

At SAP Sapphire we announced the availabiltiy of SAP HANA on Azure. My little contribution to this was working on case that was shown as a demo in the key note at SAP Sapphire 2016: Sports Basement with HANA on Azure. It was meant as a show-case and proof for running HANA One workloads in Azure DS14 VMs and it was the first case of HANA on Azure productive outside of the SAP HANA on Azure Large Instances.

While we proved we can run HANA One in DS14, what’s still missing is the official Marketplace image. We are working on that on-boarding of HANA One into the Azure Marketplace at the time I am writing this post here. This post is about a very specific challenge which I know is needed by many others, as well. While Azure will have a built-in solution, it is not available, today (August 2016), so this might be of help for you!

Scenario: A VM reading and modifying data about itself

This is a very common scenario. HANA One needs it as well. On other cloud platforms, especially AWS, a Virtual Machine can query information about itself without any hurdles through an instance metadata service. On Azure, as powerful as it is, we don’t have such a service available, yet (per August 2016). To be precise, we do, but it currently delivers information about regular maintenance, only. See here for further details. While such a service is in the works, it is not available, yet.

Instance metadata is especially interesting for software providers which want to offer their solutions through the marketplace. The metadata can be used for various aspects including association and validation of licenses or protection of software assets inside of the VM.

But what if a VM needs to modify settings through Cloud Provdier Management APIs, automatically? Even with an instance metadata service available, such requirements need a more advanced approach.

Solution: A possible approach outlined (and code on my GitHub Repo)

Based on that I started thinking about this challenge, prototyping it and sharing it with the broader technical community. With Azure having the concept of Service Principals available, I tried the following path:

  1. If we could pass in a Service Principal at the creation of the VM, we’d have all we need to call into Azure Resource Manager APIs.
  2. The VM can identify itself through it’s “Unique VM ID”. So we could query into Azure Resource Manager APIs and find the VM based on this ID.
  3. For Marketplace use cases it is necessary, that the user is FORCED to enter the credentials. So an ARM template with mandatory parameters for passing in the details for the Service Credential is needed.

With this in place we can solve both problems with a single solution: with the right permissions equipped, a Service Principal can query instance metadata through Azure Resource Manager APIs and modify virtual machine settings at the same time. Indeed, the Azure Cloud Foundry Bosh solution uses that approach as well, although it does not need to “identify” virtual machines. It just creates and deletes them…

For most Marketplace Vendors incl. the case above, the VM needs to change details about itself. So their would need to be a way for the VM to find itself through the VM Unique ID. Since nobody was able to answer the quesiton if that’s possible, I prototyped it with the Azure CLI.

Important Note: This is considered to be a prototype to proof if what is outlined above generally works. For production scenarios you’d need to code this in professional frameworks, better protect secrets by using those and build this into your product.

GitHub Repository: I’ve prototyped the entire solution and published it on my GitHub Repository here:

–>> https://github.com/mszcool/azureSpBasedInstanceMetadata

Step #1: Create a Service Principal

The first step is creating a Service Principal. That is not an easy task, especially when you think about offerings in a Marketplace where business people want to have fast and simple on-boarding.

Guess for what I’ve created this solution-prototype on my GitHub repository (with a blog-post followed). The idea of this prototype is to provide a ready-to-use service that creates Service Principals in your own subscription.

I still run this on my Azure Subscription, so if you need a Service Principal and you don’t like scripting, just use my tool for creating it. Note: please use in-private browsing and sign-in with a Global Admin (or get a Global Admin who does an Admin-Consent for my tool in your tenant).

If you love scripting, then you can use tools such as the Azure PowerShell or the Azure Cross Platform CLI. In my prototype, I built the entire set of scripts with the Azure CLI and tested it on Ubuntu Linux (14.04 LTS). Even cooler, I indeed developed and debugged all the Scripts on the new Bash on Ubuntu on Windows:
Bash on Windows

The script createsp.sh shows a sample-script which creates a Service Principal and assigns the needed roles to the Service Principal to read VM metadata in the subscription (it would be better to just target the resource group in which you want to create the VM… I just kept it like that for convenience).

# Each Service Principal in Azure AD is backed by an 'Application-registration'
azure ad app create --name "$servicePrincipalName" \
                    --home-page "$servicePrincipalIdUri" \
                    --identifier-uris "$servicePrincipalIdUri" \
                    --reply-urls "$servicePrincipalIdUri" \
                    --password $servicePrincipalPwd

# I use JQ to extract data out of JSON results such as the AppId
createdAppJson=$(azure ad app show --identifierUri "$servicePrincipalIdUri" --json)
createdAppId=$(echo $createdAppJson | jq --raw-output '.[0].appId')

azure ad sp create --applicationId "$createdAppId"

Note: I created the App and the Service Principal separately since the AppID is needed to login using Azure CLI with the Service Principal, anyways. Therefore I separated those steps since I needed to read the App and the Service Principal Object IDs, anyways.

Note: JQ is really a handy command line tool to extract data from the neat JSON-responses of the Azure CLI. Take a look at further details here.

After the Service Principal and the App are both created, I can assign the roles to the Service Principal so that he can query the VM Metadata in my subscription:

# If I would create the resource group earlier, I could use the
# --resource-group switch instead of the --subscription switch here to scope
# permissions to the resource group of the VM to-be-created, only.
azure role assignment create --objectId "$createSpObjectId" \
                             --roleName Reader \
                             --subscription "$subId" 

Finally, to complete the work, I needed the Tenant ID of the Azure AD Tenant for the target subscription which is also needed for the Login with a Service Principal with the Azure CLI. Indeed the following code-snippet is at the very beginning of the createsp.sh-Script:

# Get the entry for the target subscription
accountsJson=$(azure account list --json)

# The Subscription ID is needed throughout the script
subId=$(echo $accountsJson | jq --raw-output --arg pSubName $subscriptionName '.[] | select(.name == $pSubName) | .id')

# Finally get the TenantID of the Azure AD tenant which is associated to the Azure Subscription:
tenantId=$(echo $accountsJson | jq --raw-output --arg pSubName $subscriptionName '.[] | select(.name == $pSubName) | .tenantId')

With those data-assets above in place, the tenantId, the appId and the password selected for the app-creation, we can log-in with the service principal using the Azure CLI as follows:

azure telemetry --disable
azure config mode arm
azure login --username "$appId" --service-principal --tenant "$tenantId" --password "$pwd"

Note: Since we want to login in a script that runs automated in the VM to extract the metadata for an application at provisioning-time (in my sample – in the real world this could happen on a regular basis with a cron-job or something similar), we need to make sure to avoid any user prompts. The latest versions of Azure CLI prompt for telemetry data collection on the first call after installation. In an automation script you should always turn this off with the first command (azure telemetry --disable) in your script.

Step #2: A Metadata Extraction Script

Okay, now we have a Service Principal that could be used from backend jobs to extract metadata for the VM in an automated way, e.g. with the Azure CLI. Next we need a script to do exactly that. For my prototpye, I’ve created a shell script (readmeta.sh) that does exactly that. For this prototype I injected this script through the Custom Script Extension for Linux.

Note: Since the SAP HANA One team uses Linux as their primary OS, I just developed the entire prototype with Shell-Scripts for Linux. But fortunately, due to the Bash on Ubuntu on Windows 10, you can also run those from your Windows 10 machine right away (if you have the 2016 Anniversary Update installed).

You can dig into the depths of the entire readmeta.sh-Script if you’re interested. I just extract VM and Networking details in their to show, how to crack the VM UUID and to show, how-to extract related items which are exposed as separate resources in ARM attached to the VM.

Let’s start with first things first: the script requires the Azure Cross Platform CLI installed. On a newly provisioned Azure VM, that’s not there. So the script starts with installing stuff:

sudo mkdir /home/metadata
export HOME=/home/metadata

#
# Install the pre-requisites using apt-get
#

sudo apt-get -y update
sudo apt-get -y install build-essential
sudo apt-get -y install jq

curl -sL https://deb.nodesource.com/setup_4.x | sudo -E bash -
sudo apt-get -y install nodejs

sudo npm install -g azure-cli

Important Note: Since the script will run as a Custom Script extension, it does not have things like a user HOME directory set. To make NodeJS and NPM work, we need a Home-Directory. Therefore I set the HOME to /home/metadata to which I also save all the metadata JSON responses during the script.

The next hard thing was cracking the VM Unqiue ID. This Unique ID is available for some time in Azure and it identifiers a Virtual Machine for its entire lifetime in Azure. That ID changes when you take the VM off from Azure or delete it and re-create it. But as long as you just provision/de-provision or start/shutdown/start the VM, this ID remains the same.

But, the key question is whether you can use that ID to find a VM using ARM REST APIs to read metadata about itself, or even change its settings through Azure Resource Manager REST APIs. Obviously, the answer is yes, otherwise I would not write this post:). But the VM ID presented in responses from Azure Resource Manager REST APIs is different from what you get when reading it inside of the VM out of its asset tags – due to Big Endian bit ordering differences, also documented here.

So in my Bash-script for reading the metadata, I had to convert the VM ID before trying to use it to find my VM through the ARM REST APIs as follows:

#
# Read the VMID from the BIOS asset tag (skip the prefix, i.e. the first 6 characters)
#
vmIdLine=$(sudo dmidecode | grep UUID)
echo "---- VMID ----"
echo $vmIdLine
vmId=${vmIdLine:6:37}
echo "---- VMID ----"
echo $vmId

#
# Now switch the order due to encoding differences between the Windows and Linux World
#
vmIdCorrectParts=${vmId:20}
vmIdPart1=${vmId:0:9}
vmIdPart2=${vmId:10:4}
vmIdPart3=${vmId:15:4}
vmId=${vmIdPart1:7:2}${vmIdPart1:5:2}${vmIdPart1:3:2}${vmIdPart1:1:2}-${vmIdPart2:2:2}${vmIdPart2:0:2}-${vmIdPart3:2:2}${vmIdPart3:0:2}-$vmIdCorrectParts
vmId=${vmId,,}
echo "---- VMID fixed ----"
echo $vmId

That did the trick to get a VM ID which I can use to find my VM through ARM REST APIs, or through the Azure CLI since I am using bash-scripts here:

#
# Login, and don't forget to turn off telemetry to avoid user prompts in an automation script.
#
azure telemetry --disable
azure config mode arm
azure login --username "$appId" --service-principal --tenant "$tenantId" --password "$pwd"

#
# Get the details for the VM and save it
#
vmJson=$(azure vm list --json | jq --arg pVmId "$vmId" 'map(select(.vmId == $pVmId))')
echo $vmJson > /home/metadata/vmmetadatalist.json
echo "---- VM JSON ----"
echo $vmJson

What you see above is, that there’s today (as of August 2016) no way to query Azure Resource Manager REST APIs by using the VM Unique ID. Only attributes such as resource group and VM name can be used. Of course that applies to the Azure CLI, as well. Therefore I retrieve a list of VMs and filter it down using JQ by the VM ID… which fortunately is delivered as an attribute in the JSON response from the ARM REST APIs.

Now we have our first metadata asset, a simple list entry for the VM in which we are runnign with basic attributes. But what if you need more details. The obvious way is to execute an azure vm show --json command to get the full VM-JSON. But even that will not include all details. E.g. lets say you need the public or the private IP address assigned to the VM. What you need to do then is, navigating through relationships between those Azure Resource Manager Assets (the VM and the Network Interface Card resource, in specific). That is where it gets a bit tricky:

#
# Get the detailed VM JSON with relationship attributes (e.g. the NIC identified through its unique Resource ID)
#
vmResGroup=$(echo $vmJson | jq -r '.[0].resourceGroupName')
vmName=$(echo $vmJson | jq -r '.[0].name')
vmDetailedJson=$(azure vm show --json -n "$vmName" -g "$vmResGroup")
echo $vmDetailedJson > /home/metadata/vmmetadatadetails.json

#
# Then get the NIC for the VM through ARM / Azure CLI
#
vmNetworkResourceName=$(echo $vmJson | jq -r '.[0].networkProfile.networkInterfaces[0].id')
netJson=$(azure network nic list -g $vmResGroup --json | jq --arg pVmNetResName "$vmNetworkResourceName" '.[] | select(.id == $pVmNetResName)')
echo $netJson > /home/metadata/vmnetworkdetails.json

#
# The private IP is contained in the previously received NIC config (netJson)
#
netIpConfigsForVm=$(echo $netJson | jq '{ "ipCfgs": .ipConfigurations }')
echo $netIpConfigsForVm > /home/metadata/vmipconfigs.json

#
# But the public IP is a separate resource in ARM, so you need to navigate and execute a further call
#
netIpPublicResourceName=$(echo $netJson | jq -r '.ipConfigurations[0].publicIPAddress.id')
netIpPublicJson=$(azure network public-ip list -g $vmResGroup  --json | jq --arg ipid $netIpPublicResourceName '.[] | select(.id == $ipid)')
echo $netIpPublicJson > /home/metadata/vmipconfigspublicip.json

This should give you enough of the needed concepts to get all sorts of VM Metadata for your own VM using Bash-scripting. If you want to translate this to your Java, .NET, NodeJS or whatsoever code, then you need to look at the management libraries for the respective runtimes/languages.

Step #3: Putting it all together – the ARM template

Finally we need to put this all together! That happens in an ARM template and the parameters this ARM template requests from the user to be entered on provisioning. An ARM-template similar to this could be built for a solution template based Marketplace Offer.

On my GitHub repository for this prototype, the ARM template and its parameters are baked into the files azuredeploy.json and azuredeploy.parameters.json. I won’t go through all details of these templates. The most important aspects are in the parameters-section and in the VM creation section where I hook up the Service Principal with the Script and attach it as a Custom Script Extension. Start with an excerpt of the “parameters”-section of the template:

"parameters": {
    "storageAccountName": {
      "type": "string"
    },
    "dnsNameForPublicIP": {
      "type": "string"
    },
    "adminUserName": {
      "type": "string"
    },
    "adminPassword": {
      "type": "securestring"
    },
    "azureAdTenantId": {
      "type": "string"
    },
    "azureAdAppId": {
      "type": "string"
    },
    "azureAdAppSecret": {
      "type": "securestring"
    },
    ...
  },
...

The important parameters are the azureAdTenantId, azureAdAppId and azureAdAppSecret parameters. Those together form the sign-in details for the Service Principal as it is used in the script described in the previous section to read out the metadata for the VM on provisioning, automatically.

Reading the metadata is initiated through specifying my readmeta.sh-script as a custom script extension for the VM in the ARM template as below:

...
    {
      "type": "Microsoft.Compute/virtualMachines/extensions",
      "name": "[concat(parameters('vmName'),'/writemetadatajson')]",
      "apiVersion": "2015-06-15",
      "location": "[parameters('location')]",
      "dependsOn": [
        "[concat('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"
      ],
      "properties": {
        "publisher": "Microsoft.OSTCExtensions",
        "type": "CustomScriptForLinux",
        "typeHandlerVersion": "1.5",
        "settings": {
          "fileUris": [
            "[concat('https://', parameters('storageAccountName'), '.blob.core.windows.net/customscript/readmeta.sh')]"
          ]
        },
        "protectedSettings": {
          "commandToExecute": "[concat('bash readmeta.sh ', parameters('azureAdTenantId'), ' ', parameters('azureAdAppId'), ' ', parameters('azureAdAppSecret'))]"
        }
      }
    }
...

Since the Azure Linux Custom Script extension prints a lot of diagnostics details about what it is doing, we need to at least make sure that our sensitive data, especially the Service Principal’s password is NOT included in that diagnostics logs to keep it protected (well… as good as possible:)). Therefore the commandToExecute-setting is put into the protectedSettings-section which is NOT disclosed in any diagnostics-logs from the Custom Script Extension.

Important Note: On the Azure Quickstarts Template Gallery are many templates that are using the custom script extension version 1.2. For having the commandToExecute-setting in the protectedSettings-section, you have to use a newer version. For me, the latest version 1.5 at the time of writing the post worked. With the previous versions it just didn’t call the script.

Step #4: Trying it out…

Before you can try things out, there’s one thing you need to prepare: create the storage account and upload the readmeta.sh-script into that account (argh, next time I just write the scripts to clone my GitHub-repository:)). To make it easy, I created a script called deploy.sh with 10 parameters that does everything:

  1. Create the Resource group
  2. Create the storage account
  3. Upload the script to the storage account
  4. Update the parameters in azuredeploy.parameters.json to reflect your service principal attributes
  5. Start the deployment with the template and the updated template parameters.

And while trying I thought the 10 parameters make it flexible, but it’s still a hard start if you’d love to just quickly try this. So I created another bash-script called getstarted.sh. That asks you for all the data interactively and calls the createsp.sh and deploy.sh scripts based on the input you interactively entered. Just like below:

Getting Started

Final Words

With this in place, you have a solution that allows you to do both, reading instance metadata of the VM in which your software runs and also (with the right permissions set on the Service Principal) modify aspects of the VM through Azure Resource Manager APIs or Command Line Interfaces.

Sure, this reads like a complex, long thing. It would be much easer for Instance Metadata if you could do it without authentication and Service Principals. All I can say is that this will change and will become easier. But for now, that’s a solution and I hope I provide you with valuable assets that make the story less complex for you to achieve this goal!

And even when we have a simpler solution for instance metadata available in Azure, the content above shows you some advanced scripting concepts of which I hope you can learn from. The coolest thing of it: since Windows 10 Anniversary Update you can run all of the above on both, Windows and Ubuntu Linux, BECAUSE all is written as Bash scripts.

For me the nice side-effect of this was experiencing, how mature the Linux Subsystem for Windows seems to be. What really surprised me is, that I even can run Node Version Manager and build-essential on it (I even tried compiling v5 of my Node.JS version using it and it ran through and works).

Anyways – if you have any questions, reach out to me on Twitter.

NServiceBus, Azure Service Bus and Service Bus for Windows Server – A PoC for a Hybrid Cloud / Portable Solution

NServiceBus is a very popular messaging and workflow framework for .NET developers across the globe. This week a few peers of mine and I were working for one of our global ISV partners to evaluate, if NServiceBus can be used for Hybrid Cloud and portable solutions, that can be moved seamless from On-Premises to the Public Cloud and vice-versa.

My task was to evaluate, if NServiceBus can be used with both, Microsoft Azure Service Bus in the public cloud as well as Service Bus 1.1 for Windows Server in the private cloud. It was a very interesting collaboration and finally I got to write some prototype code for one of our partners, again. How cool is that – doing interesting stuff and at the same time it helps a partner. That’s how it should be!

Part #1: On-Premises Service Bus 1.1 Environment

The journey and prototyping began with setting up an On-Premises Service Bus 1.1 environment in my home lab. Fortunately there are some good instructions out there, but of course nothing goes without any pitfalls. Here’s a good set of instructions to start with – note that I did setup an entire Azure Pack express setup, which is clearly optional. But it makes things more convenient, especially for presentations, since it provides the nice, good old Azure Management Portal experience for your Service Bus on-premises. Here’s where you should look at how-to setup things:

  • Install the Azure Pack Express Setup
    • This shows, how-to setup a basic Azure Pack environment using Web Platform Installer.
    • I did not run into any problems installing it on a Hyper-V Box on my Home Lab. So it should be fairly straight forward.
    • You can install all on a single machine. What you need is SQL Server (Express is sufficient) pre-installed.
  • Install Service Bus for Windows Server
    • Again this happens via Web Platform Installer. With this I had a little challenge. Unfortunately as of writing this article, the link to the required version of the Windows Fabric in the Web Platform Installer was broken. I’ve uploaded it
      on my public OneDrive for convenience. You find it here. But I’ve had conversations with the product team, they will fix the broken link. So when you try it, it might work, already.
  • Configure Service Bus using the Wizard.
    • After installing you need to configure Service Bus for Windows Server. That happens through a Wizard. It essentially allows you to configure endpoints, ports and certificates used for security purposes for Service Bus 1.1 for Windows Server.
    • The configuration failed on the first attempt because something went wrong with installing the Service Bus patch. Uninstall and re-install did solve the problem.
  • Configure Service Bus for Windows Server for the Azure Pack Portal
    • That’s the final step to get the Azure Pack management portal experience for Service Bus 1.1 for Windows Server.
    • If you are fine with managing Service Bus through PowerShell, you can skip the entire Azure Pack Express stuff, start with Service Bus 1.1 right away and manage it through PowerShell.

At the end of this journey, which took me about half a day overall starting from scratch and figuring out the little gotchas mentioned above, I had a lab environment to test against. I am a fan of using Royal TS, hence the screen shot with my on-premises Service Bus as web pages embedded in Royal TS:

Part #2: NServiceBus and Azure Service Bus

The second part of the challenge was easy – figure out if NServiceBus supports Azure Service Bus, already. Because that would give us a good starting point, wouldn’t it!? Here are the docs for the NServiceBus transport extension for the Azure Serivce Bus: source code.

But the point is, the earliest version that supports Azure seems to be NServiceBus v5.0.0 and they started with a Microsoft.ServiceBus.dll above 3.x in the code-base. This version of the library is not compatible with Service Bus 1.1 for Windows Server. So I had to dig into the source code and back-port the library. Fortunately, Particular is open sourcing most of the framework’s bits and pieces on GitHub – so it does for the NServiceBus.AzureServiceBus connector here.

Note that I am referring to the version 6.2 of the implementation, directly, since that works with NServiceBus 5.0.0 which our global ISV partner is using at this point in time. I also tried back-porting the current development branch, but that turned
out to be way more complex and risky. And it was not needed for the partner, either:)

Part #3: Back-Porting to Microsoft.ServiceBus.dll v2.1

So the needed step is to back-port to a Service Bus SDK library that also works with Service Bus 1.1. Service Bus for Windows Server recently received a patch to work with .NET 4.6.1, but it has not received any major updates since its original release.
So it is behind with regards to its APIs compared to Service Bus in Azure.

I’ve done all of the steps below on my GitHub repository in a fork of the original implementation. Note that you should only look at my work in the branch ‘support-6.2’ which is the one that works with NServiceBus 5.0.0. The rest is considered to be experiments as we speak right now:)

Here is the link to my GitHub repo and the fork!!

The first step for doing so was to remove the NuGet package and replace it with one that works with Service Bus for Windows Server. Microsoft fortunately released a separate NuGet package that contains the version that is compatible with Service Bus for Windows Server:

The rest was all about looking where the NServiceBus-implementation is using features that are not available in version 2.1 of the Service Bus SDK and testing it against my Service Bus 1.1 for Windows Server lab setup. I think the best place to look at
what actually changed is by looking at the change-logs on my GitHub-Repository:

  • Initial back-port with most code changes (click here to open details on GitHub)
    • Update the NuGet Package to “ServiceBus.v1_1” instead of “WindowsAzure.ServiceBus”.
    • Remove EnablePartitioning because that’s not supported on SB 1.1.
    • Use MessagingFactory.CreateFromConnectionString() instead of MessagingFactory.Create() because the latter one does not assume different ports on different EndPoints for different APIs Service Bus is exposing. But that’s typically the case on default-setups of Service Bus for Windows Server (see my first ScreenShot).
    • I also added some regular expressions to detect, if a Service Bus Connection String is one for on-premises or the public cloud to keep most of the default behaviors when connecting against the public cloud true. See the code-snippet below. It might not be complete or perfect, but it fulfills the basic needs.
  • Added some samples (click here to open details on GitHub)
    • This contains a basic Sender and Receiver implementation that uses the transport.
    • You need to set the Environment Variable AzureServiceBus.ConnectionString in a command prompt and start Visual Studio from that one to successfully execute. Btw. that’s also needed if you need to run the tests. In that case you
      also need to set AzureServiceBus.ConnectionString.Fallback with an alternate Service Bus connection string.

Here is the little code-snippet that checks if the code is used for on-premises Service Bus services or for Azure Service Bus instances:

class CreatesMessagingFactories : ICreateMessagingFactories
{
    #region mszcool 2016-04-01

    // mszcool - Added Connection String parsing to detect whether a public or private cloud Service Bus is addressed!
    public static readonly string Sample = "Endpoint=sb://[namespace name].servicebus.windows.net;SharedAccessKeyName=[shared access key name];SharedAccessKey=[shared access key]";
    private static readonly string Pattern =
        "^Endpoint=sb://(?<namespaceName>[A-Za-z][A-Za-z0-9-]{4,48}[A-Za-z0-9]).servicebus.windows.net/?;SharedAccessKeyName=(?<sharedAccessPolicyName>[\\w\\W]+);SharedAccessKey=(?<sharedAccessPolicyValue>[\\w\\W]+)$";

    public static readonly string OnPremSample = "Endpoint=[namespace name];StsEndpoint=[sts endpoint address];RuntimePort=[port];ManagementPort=[port];SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=[shared access key]";
    private static readonly string OnPremPattern =
        "^Endpoint=sb\\://(?<serverName>[A-Za-z][A-Za-z0-9\\-\\.]+)/(?<namespaceName>[A-Za-z][A-Za-z0-9]{4,48}[A-Za-z0-9])/?;" +
        "StsEndPoint=(?<stsEndpoint>https\\://[A-Za-z][A-Za-z0-9\\-\\.]+\\:[0-9]{2,5}/[A-Za-z][A-Za-z0-9]+)/?;" +
        "RuntimePort=[0-9]{2,5};ManagementPort=[0-9]{2,5};" +
        "SharedAccessKeyName=(?<sharedAccessPolicyName>[\\w\\W]+);" +
        "SharedAccessKey=(?<sharedAccessPolicyValue>[\\w\\W]+)$";

    private bool DetectPrivateCloudConnectionString(string connectionString)
    {
        if (Regex.IsMatch(OnPremPattern, connectionString, RegexOptions.IgnoreCase))
            return true;
        else if (Regex.IsMatch(Pattern, connectionString, RegexOptions.IgnoreCase))
            return false;
        else {
            throw new ArgumentException($"Invalid Azure Service Bus connection string configured. " +
                                        $"Valid examples: {Environment.NewLine}" +
                                        $"public cloud: {Pattern} {Environment.NewLine}", 
                                        $"private cloud (SB 1.1): {OnPremPattern}");
        }
    }

    #endregion

    ICreateNamespaceManagers createNamespaceManagers;
    // ... rest of the implementation ...
}

The part where I needed this detection most was to decide, how-to instantiate the MessagingFactory. This is the relevant piece of code – note that MessagingFactory.Create() with the NamespaceManager-Address passed in does only work in the public cloud, not with Service Bus 1.1 on Windows Server:

class CreatesMessagingFactories : ICreateMessagingFactories
{
    // ... earlier stuff in that class including 'DetectPrivateCloudConnectionString' ...

    ICreateNamespaceManagers createNamespaceManagers;

    public CreatesMessagingFactories(ICreateNamespaceManagers createNamespaceManagers)
    {
        this.createNamespaceManagers = createNamespaceManagers;
    }

    public MessagingFactory Create(Address address)
    {
        var potentialConnectionString = address.Machine;
        var namespaceManager = createNamespaceManagers.Create(potentialConnectionString);

        // mszcool - Updated to detect if Service Bus 1.1 for Windows Server is used
        if (DetectPrivateCloudConnectionString(potentialConnectionString))
        {
            // mszcool - Need to use this approach because different ports for control and transport endpoints are used
            return MessagingFactory.CreateFromConnectionString(potentialConnectionString);
        }
        else {
            var settings = new MessagingFactorySettings
            {
                TokenProvider = namespaceManager.Settings.TokenProvider,
                NetMessagingTransportSettings =
            {
                BatchFlushInterval = TimeSpan.FromSeconds(0.1)
            }
            };
            return MessagingFactory.Create(namespaceManager.Address, settings);
        }
    }
}

Finally with those fixes incorporated, I was able to get almost all things working and all except two tests passing for now. For the proof-of-concept that is successful for now, since it proofs that the partner could achieve what they need to achieve.

Part #4: See it in Action

Now comes the cool part – testing it out and seeing it in Action. The Samples I’ve added to the git-repository are simple messaging examples which I’ve modified from the NServiceBus Samples repository as well. Note that I’ve taken the Non-Durable MSMQ sample to proof since starting-point for the partner was MSMQ and I wanted something super-simple to start with. That is just how far I got, eventually I try other samples (but no promise at this time:)). Below the code-snippet of the sender – the receiver looks nearly identical and you can look it up on my repository on GitHub:

static void Main()
{
    string connStr = System.Environment.GetEnvironmentVariable("AzureServiceBus.ConnectionString");

    Console.Title = "Samples.MessageDurability.Sender";
    #region non-transactional
    BusConfiguration busConfiguration = new BusConfiguration();
    busConfiguration.Transactions()
        .Disable();
    #endregion
    busConfiguration.EndpointName("Samples.MessageDurability.Sender");
    busConfiguration.ScaleOut().UseSingleBrokerQueue();
    busConfiguration.UseTransport<AzureServiceBusTransport>()
        .ConnectionString(connStr);
    busConfiguration.UseSerialization<JsonSerializer>();
    busConfiguration.EnableInstallers();
    busConfiguration.UsePersistence<InMemoryPersistence>();

    using (IBus bus = Bus.Create(busConfiguration).Start())
    {
        bus.Send("Samples.MessageDurability.Receiver", new MyMessage());
        Console.WriteLine("Press any key to exit");
        Console.ReadKey();
    }
}

Here’s the code actually in action and working. You see what it produced on my on-premises Service Bus as well as the log output from the console windows. Note that the message handler part of the receiver outputs that it received a message.

One thing I played around with was having two receivers, that’s why you see two output-lines for one single message in my receiver. To clarify, here’s the code of MyHandler.cs from the Receiver project which outputs those lines:

public class MyHandler : IHandleMessages<MyMessage>
{
    static ILog logger = LogManager.GetLogger<MyHandler>();

    public void Handle(MyMessage message)
    {
        logger.Info("Hello from MyHandler");
    }
}

public class MyHandler2 : IHandleMessages<MyMessage>
{
    static ILog logger = LogManager.GetLogger<MyHandler2>();

    public void Handle(MyMessage message)
    {
        logger.Info("Hello from MyHandler2!");
    }
}

Final Words

I think this Proof-Of-Concept we’ve built alongside with other aspects we covered for that Global Software Vendor partner in UK demonstrates several aspects:

  • That it is possible to have a solution that works (nearly) seamless on-premises and in the public cloud with largely the same code base.
    • The situation should DRAMATICALLY improve once Microsoft has released the Azure Stack, which is the successor of what I’ve used here (which was the Azure Pack).
    • We can expect that the Azure Stack will deliver a much more up2date and consistent experience with Azure in the public cloud once it is fully available incl. Service Bus.
  • That with NServiceBus one of the most important 3rd-party middle-ware frameworks plays very well together with Azure and that it also can be used for Service Bus On-Premises with some caveats (like back-porting the transport-library).
    • An alternative, which I also tried to demonstrate with the simple sample, would be to use the MSMQ NServiceBus transport on-premises and use the AzureServiceBus Transport from NServiceBus for public cloud deployments. As long as only features
      supported on both sides are used, that might be the preferred way since with that you can fully rely in code delivered for you by NServiceBus without any changes.

Note that my attempts are meant to be a Proof-of-Concept, only. You can look at them, try them and even apply them for your solutions fully at your own risk:)

I think it was a great experience working with the team in UK on this part of a larger Proof-of-Concept (which also included e.g. Azure Service Fabric for software that
needs to be portable between on-premises and the public cloud but wants to make use of a true Platform-as-a-Service foundation).

I hope you enjoyed reading this and found it interesting and useful.

How-to remove secrets from the entire history of a Git-repository!!

I really do like Git a lot and even for my private projects I use it as the default. But some aspects of it are quite tricky. A well-known practice is, that you should never check-in secrets or things you don’t want to share with others into a Git-repository. That is especially interesting with public repositories hosted on e.g. GitHub.

Well, saying you should not and actually not forgetting about it are two different things. Sometimes it just happens. And even if you are careful with secrets, it can also be other stuff you checked in but didn’t want to share with others. So it happened to me when I wrote the last blog-post about automating my developer machine setup and published my Machine Setup Script.

The secrets in the history on GitHub!?

As explained in my previous blog-post, in the setup automation script I use for re-setting up a fresh developer machine, I also do clone a hand of repositories which are of relevance to me and/or to which I contributed some code. The majority of those repositories is public on GitHub. But some of them are from real-world projects with our customers and partners which are hosted in a private VSTS environment we run for our global team. I accidentally published that list of git clone commands as well.

No passwords, no secrets – but the repository names sometimes contained the names of the partners/customers and some of that work is not done or public, yet. So even though these were not secrets, I am not supposed to share them, yet.

Unfortunately, I realized that only after a few check-ins. So the entire history in my public GitHub repository contained those repository-names from an internal VSTS environment which I didn’t want to share. Damn… the post is out, the link points to
the repository… what to do?

How-to remove secrets/content from the entire history with Git?

Of course the “easy” way for this specific case would have been to delete the repository and re-create a new one with the fixed file published. That works for cases where the history is not really important and where you have a truly small repository. In other words, it works for samples and the likes. But even I have received a pull-request for that file which I didn’t want to loose, either. By all means, deleting and re-creating is not something that should be considered as a solution for this problem.

So, I did a little Internet-search and came across something that can save many Git-Hub repositories from mandatory deletion to remove things from the entire history, I guess:

The BFG Repo Cleaner

This is an awesome tool if you ran into the problem I’ve had. Let’s say you have published something into a Git-repository across multiple commits and pushes that you want to get rid of from the entire history. All you need to do are the following steps:

  1. Download the BFG Repo Cleaner into a local directory of your choice.
    1. The app is written in Scala
    2. It requires a Java-runtime on your machine.
    3. It is distributed as a JAR-package that contains all dependenices.
  2. Open up a command prompt and switch to a temporary directory.
    1. I did this in a temp-directory because it requires a new git clone --mirror of your repository which is a 1:1 mirror of the remote repository.
    2. After that you need to push that mirror back to the remote repository again. And then you can delete the mirror and return to your ordinary repository clone.
  3. Perform a clone of your repository with the option --mirror (I am using my devmachinesetup-repo here since I had to do it with this one, so just replace devmachinesetup with any of your repository-names in the commands below).
    1. git clone --mirror https://github.com/mszcool/devmachinesetup.git
    2. This clones a mirror of your remote repository with the entire history into a sub-folder of the current folder called devmachinesetup.git.
  4. Stay in the folder that contains the devmachinesetup.git folder with the mirrored repository in it.
  5. Create a text file that contains the text you want to purge from the history of all files in your git repository.
    1. Each line contains a string (incl. spaces, special characters etc.) that you want to remove. In my case these strings were the complete git clone <<repositoryname>> commands which I wanted to remove from the history
      of commits of the script in the repository. Each line in this text-file contained one of those entire commands.
    2. BFG searches every file in your git-mirror folder and replaces each instance of each lines from the text-file with the text *** REMOVED *** in the target files of the repository.
    3. A little sample excerpt for how the content of that text file shows, how simple it is – in my case it was just one git-clone per line which I wanted to remove from the history:git clone https://xyz.visualstudio.com/_DefaultCollection/first.git
      git clone https://xyz.visualstudio.com/_DefaultCollection/second.git
      git clone https://xyz.visualstudio.com/_DefaultCollection/third%20complex%20name.git thirdrepo
  6. Execute the BFG command. Note that BFG is based on the Java-runtime, so either add the folder with BFG JAR-package to your CLASSPATH environment variable or specify the full path to the JAR-package when executing Java. This looks similar to:
    1. java -jar C:\Temp\bfg-1.12.8.jar --replace-text myunwantedtext.txt devmachinesetup.git
    2. Note that when you download the BFG JAR package, the version in the name of the .jar-file might be different.
    3. file myunwantedtext.txt contained the full
  7. Now BFG has replaced the unwanted content in the local clone. Last-but-not-least you need to push that one back to the remote repository.
    1. Again, in your command prompt window remain in the directory which contains the devmachinesetup.git sub-directory with your git-mirror.
    2. Execute git push to push the mirror back.

That’s it, you’re done. After I executed the steps above on my repository, I checked online with several commits if it worked. In your case, the result should now look similar to what I’ve achieved once you’ve completed the steps above:

Results of Removing unwanted content

Final words

Removing unwanted content from the entire history of a Git-repository is needed sometimes. Whether it’s about accidental commits of secrets or other (sensitive) content or e.g. large files you want to clean up from your repository.

The BFG Repo Cleaner is a handy tool for such cases. It can indeed be used for cases such as the one I described. But it also contains options for other cases such as removing large files from the history of your repository which are not needed there, anymore.

BFG is cool and handy, but if you need more advanced scenarios, you might need to fall-back to the way more powerful, yet much more complex git-filter-branch tool (here). I guess that for 80% of the cases, BFG might be good enough and given the fact it is super-easy to use I’d first give it a chance before digging through the docs of git-filter-branch.

Kudos to the folks which built BFG… great job and thank you very much for saving my day (I will donate;))…