Wednesday, June 17, 2020

Cloud-Native Design Techniques - Scale

Designing systems for the cloud, or to be cloud-native, is all the rage right now. There are numerous articles written about this, but most of them come across like partial cooking instructions where the reader is expected to know the recipe already. I want to talk about cloud-native design, but not make any assumptions about prior knowledge.

For the past 14 years, I've worked on a system that processes electronic payment transactions. When we built this system, we specifically designed the applications to handle scale. We were betting the business on being able to scale the software linearly with business growth. What exactly does this mean and how is it done?

First, let's define what it means for an application to "scale." In the simplest terms, an application is able to "scale" when it can handle more requests or load without changing the application itself, without failing, and generally without significant changes to the infrastructure that executes it. Let's take for example an order handling system. The code below loosely defines this system:

We can see in the code that three activities occur for a new order: 1) the order is validated, 2) the order is billed, and 3) the order is shipped. In this example, the order validation must occur for the other two activities to happen, but not all of the activities are chained together. We can see that the process of billing an order interacts with a database. Database interactions generally require requests over a network and can be constrained by the resources available to the database server. This operation could be slow or contentious. If orders occur infrequently, this approach is fine and will work well enough. If orders are streaming in rapidly, this approach will either cause significant delays in processing orders, or it will crash the entire system as resources get overwhelmed.

There are three ways that we could easily modify this system to handle scale. The first approach would be to upgrade the server it runs on, or to run it on multiple servers. A hardware-based approach is the easiest for developers because they don't really have to do anything, but it may not be the best long-term solution. Upgrading the existing server may allow for more orders to be handled for a while, but the system will fail again as the number of orders increases. Running this system on multiple servers may work, or it may cause more contention in shared resources, like a database, that causes the system to fail even sooner. If this application is running on a virtual machine in a public cloud, modifying the underlying hardware is an expensive approach to making it scale.

A second approach would be to run the HandleOrder method in a new thread for each order. This approach isn't terrible, but it isn't as easy as it sounds. Simply creating a new thread for each order may work, but that assumes that everything in our HandleOrder method is thread-safe. Most database operations are not thread-safe, so the order billing operation will be an issue. In my experience, a lot of code is only thread-safe by accident, not because it was intentionally written that way. Using threading to scale this application may be a valid approach, but it requires diligence and understanding to get it right. The other limitation to this approach is that at some point, the resources of the server will be exhausted and the app will no longer scale. When this happens, a hardware approach will be necessary.

The third approach to making this system scale would be to break up the operations and handle them independently. We see that the order validation is the only required step for billing or shipping an order. The system can be broken up into three pieces, two of which can be called independently. The code below illustrates how the separation could occur:

The code above separates the billing and shipping calls and allows them to run simultaneously since they aren't dependent on each other. (This code is overly simplistic to illustrate my point.) Breaking the system into constituent parts allows us to scale the BillOrder function separately from the ShipOrder function. As our load of orders grows, we can run the BillOrder function separately from the ShipOrder function, using the HandleOrder function as an orchestrator for the work. In a public cloud environment, this approach would allow us to run the billing and shipping functions as either serverless functions or independent Docker containers. (Either serverless functions or independent containers are cheaper options than running full virtual machines in public cloud environments.)

Designing for scale requires an architect to see the divisions in an application and to separate the system along these divisions. Once that separation is properly done, the system is better prepared to take advantage of cloud techologies to handle scale in a more cost-effective manner.

Wednesday, June 3, 2020

F# and Azure Functions

Lately I've been creating tools for my wife to use with her classroom to help with distance learning. I've implemented these tools using Azure Functions. Azure Functions are the serverless computing option on the Azure platform. I picked these because I'm familiar with Azure and they are a quick, cheap option for standing up an API.

Another reason I picked Azure Functions is because they support creating functions in F#. I have been able to create a function project and function in F#, and at times run it locally, but I have struggled to get an F# function working in the cloud. I assumed that this issue was a combination of spotty support for F# in some areas, and my lack of understanding about how to get F# to play nicely as an Azure Function.

I have a new project I want to start as a series of Azure Functions, so I wanted to give F# another try. In my reading, I found this step-by-step guide by Luis Quintanilla for manually creating Azure Functions in F#. Luis does a great job walking the reader through creating a function in F#, without relying on the built-in templates provided by Microsoft. I found this guide fantastic because it gave me a better understanding of the composition of an Azure Function, AND it got me closer to successfully deploying an F# function.

The guide isn't perfect. It's missing some steps around adding necessary packages to the project that are referenced in the function itself (easy to do). This "gap" helped me discover some version discrepancies between by installed version of the .NET Core SDK and the Azure Function CLI tool. At first it appeared to be an issue with my project templates being out of date, but updating the templates didn't seem to fix anything. Ultimately, I updated my version of the .NET Core SDK to the latest version, and I upgraded my version of the Azure Function CLI.

At this point, my function that I created by following the guide worked locally and in the cloud! I went back and re-created my function using the command-line tools, and that worked as well. Using version 3.1.300 of the .NET Core SDK and version 3.0.2534 of the Azure CLI tools (which are targetting the Azure Function Runtime version 3.0.133353.0), the following steps will successfully create an Azure Function in F# that will work in the cloud:

mkdir HelloWorld
cd HelloWorld
dotnet new func --language F#
dotnet new http --language F# --name PrintHello
dotnet build
func start

The logging displayed in the console from the func start command will provide the URL for the local function. Pointing your browser to that URL will activate the locally running function.

I'm now ready to start creating larger applications in F#!