bright ideas

How to save time and money with Azure automation

microsoft azure

Leveraging automation against any application or platform can yield significant benefits to any organisation. There are many objectives for introducing automation, however for the purposes of this blog I’d like to explore these typical categories:

  • Improving customer experience
  • Minimising operational costs
  • Reducing human effort

The value proposition of automation in the Microsoft Azure platform is no different to on-premise infrastructure. There are several opportunities for automation to achieve all of the above objectives. It’s important to note that the cost savings that can be achieved, given the very real costs of public cloud, can be significant.

Now let’s explore some of these aspects in more detail.

Improving customer experience

Technology exists to serve the user or consumer experience. It’s pretty much as simple as that.

The example I’ll detail below is admittedly centred on Azure Cloud Services, which is currently a slight departure from the more traditional Virtual Machine construct that still largely represents the enterprise IT landscape.  However, automation in this area has been observed to have a marked impact of service quality and customer experience, so I thought it a worthy inclusion.

At ViFX, in our role of managing a customer’s Azure infrastructure, a trend was observed within a series of deployments that resulted in memory exhaustion and subsequently a service degradation to the end user.

Prior to automation being implemented the resolution was to:

  1. Identify the specific Cloud Service instance
  2. Login to the Azure portal
  3. Select the correct Subscription
  4. Navigate to the affected Cloud Service
  5. Select the problem instance and reboot it

With the other running instances being cookie cut off the same template and serving the same application, it was almost certain that the same symptom would affect other instances at some future point in time, necessitating the same manual resolution, and only after already having caused a service impact.

Clearly this is not an ideal resolution, so automation was implemented that would sequentially reboot all instances of a Cloud Service. The sequential part is critically important, so as to not in itself cause a service outage. Specifically, the automation must wait for the rebooted Cloud Service instance to return to a ready state for a number of successive check intervals. At that point the instance is considered to have stabilised and the next instance is then rebooted. The schedule was set to be frequent enough to proactively recycle all of the instances before the memory exhaustion issue would become an issue.

This automation alone had a profound impact on service delivery and resulted in positive feedback being received from our customer’s end users. This automation ultimately delivered a very simple insurance policy to ensure the availability and quality of service delivery.

Minimising operational costs

Due to the fact that there is a financial dimension to this section it is unsurprisingly an area where a lot of focus and effort has been placed. There are a number of areas where ViFX have introduced automation within managed Azure infrastructures to focus on cost reduction.

Let’s consider a single example which will open your eyes to the opportunity and value that can be derived from automation.

Non-production resource

For the purpose of this blog, we’ll agree that a non-productive resource is considered to be anything that is not required on a 24x7 basis. This tends to encompass Dev, Test, UAT and Training environments, typically required during business hours only.

For the sake of an example let’s assume that we’re looking at a UAT environment that requires a P6 database and four A3 Cloud Services to provide the necessary performance.  All of these resources running for a full month (running in the Australia East region) will have a total cost of $6,824.

How can these costs be aligned with actual usage requirements?

Azure has a native ability to scale Cloud Services based on configurable time based schedules. This allows the Cloud Services to be automatically scaled up and down to align with business hours. Scaling down to a single instance out of hours can save $941 alone.

The majority of the cost however is in the database which is not able to be scaled automatically or on any schedule.

To solve this ViFX has developed automation that will scale down the database to a standard tier S0 database ($21 per month) outside of business hours, and scale back up to the required P6 tier during those necessary 8 hours where that performance is actually required.

So what savings have been returned here?  A massive $3,927 can be returned from scaling the database out of business hours. That is a total of $4,868 from the original $6,824 being returned to the bottom line, representing savings of over 70%!

Reducing human effort

Within all IT environments there are always repetitive tasks that need to be performed. These will of course vary greatly within each individual customer, but let’s consider what is expected to be some regular operations within many environments.

Managing resource sprawl

When deploying a services item in Microsoft Azure you immediately commit yourself to an ongoing cost. Recall the early days of virtualisation where server sprawl became a thing? In the public cloud this is also a very real, potentially less obvious, and expensive issue.

To illustrate, let’s consider the following:

An Azure SQL Premium P6 database, running in the Australia East region costs $5,176 per month. Take a copy of that database and forget to scale to a different service tier and you just spent another $5,176 a month.

Equally an A3 Cloud Service VM (4 Cores, 7GB RAM, 999GB disk), again running in the Australia East region costs $412 per month. If you have AutoScale enabled on your cloud services and an additional instance is instantiated – boom, another $412 per month of additional cost.

To task of manually auditing all of the resources deployed in Azure, across multiple Subscriptions, can literally take hours of human effort per day to keep on top of these simple things. As people are generally expensive, the cost to manage this quickly becomes prohibitive.

So what has automation done to help here?

Every day our automation collects a complete inventory, across all Subscriptions, for all deployed Virtual Machines, Cloud Services and SQL databases. The consolidated results are delivered via email and irregularities (large numbers of server instances, new Premium tier databases) are easily identifiable.

It’s worth mentioning that Azure Cloud Services have both a Product and Staging slot concept. This allows for deployment of new Cloud Services to be made and then the deployment “swapped” into production. Resources in the Staging slot cost exactly the same as tier Production counterparts, yet deliver nothing to end users. So, our automation also reports any resources that exist in the Staging slot and if an imminent swap is not scheduled, we actively seek to remove these and their non-productive cost.

Database copies

Taking a production database copy to refresh a UAT or training platform is extremely common. The objective this process serves is to ensure that developers and test users are working with up to date and representative datasets.

In an Azure environment this can be quite intensive, especially if production and test are delivered out of separate Subscriptions for cost isolation.

A typical process might look like this:

  1. Take a copy of the production database to an existing server in the production Subscription
  2. Export the copy database to Blob Storage in the test Subscription
  3. Delete the copy database, or you’re wasting money ;-)
  4. Import the database onto the test SQL server from Blob Storage
  5. Delete the backup from Blob Storage

Now, each of these steps is pretty simple to execute, but the problem is that with so many discrete steps the execution of this process is time consuming, if being closely babysat, or latent if there is time delay between executing each step.

Fortunately, this is a pretty well described data pipeline and is easily automated. By way of an example, ViFX has recently performed a similar execution for a customer, migrating some 80-plus databases between SQL Servers. If we consider the above workflow being executed manually, at this scale it would have taken weeks to archive, and would probably have resulted in at least one resignation! Leveraging automation this task was able to be performed in only four days, and the reason for taking that long was due to working with migration schedules imposed by the database owners and consumers. This is still an incredibly significant reduction in time and associated cost.

The possibilities are endless

These are but a few examples of where automation has transformed the availability of Azure environments that we manage, and also removed a lot of human burden from platform management tasks.

Hopefully some of these examples trigger a desire to look at your own Azure deployments and identify areas where automation could return benefits to your operation, from either a cost or effort perspective. The possibilities are endless after all.

Have you had any similar experiences of marked improvements from automation, or on the flip side any nightmare stories of unexpected cost due to lack of automation? Feel free to share your stories or thoughts below.

Author: Andrew Jensen

Andrew specialises in the design and delivery of cloud and infrastructure solutions to meet the complex requirements of today's customer.

23 September 2015 / 0 Comments