Hierarchical cloud designs? Don’t forget the network!
In my job, I get to spend a significant amount of time with cloud architects who are designing scalable POD architectures for their cloud data centers. The size of each POD is fixed, and scale is attained by organizing multiple PODs into a hierarchical layout. Hierarchical designs are a great way to achieve scale, but for them to be successful, it’s critical to carefully select a minimum set of properties to scale across PODs, and to constrain as many properties as practical within each POD. If you expect all properties to scale across PODs, then your POD is not really a useful building block anymore. A manageable architecture requires you to maximize the constraints (in order to reduce cost and/or complexity) without impacting the agility of the resulting cloud infrastructure. More constraints simplify the design, but the wrong constraints limit the usefulness of the solution.
Relevant metrics increase visibility
When it comes to networking design and constraints, a very common question I deal with is: can I constrain my Layer 2 networks to live within a POD? Or do I need my Layer 2 networks to roam freely across PODs? The context is typically built around the number of VMs per tenant, since tenants might desire Layer-2 adjacency for their VMs. As the number of “VMs per tenant” rises, it’s more likely that there’ll be a need for Layer 2 to cross POD boundaries – or so goes the conventional wisdom.
Unfortunately, from a networking perspective, “VMs per tenant” is a misleading metric. Networks are traditionally organized into IP subnets, and only intra-subnet flows use connectivity at Layer 2, while inter-subnet flows require IP connectivity at Layer 3. A single metric (“VMs per tenant”) that collapses these two separate layers is bound to cause confusion. A more pertinent set of metrics for your networking properties are the number of “VMs per subnet” to size your Layer 2 requirements, and a count of “subnets per tenant” to size your Layer 3 (IP) requirements. The disaggregation of “VMs per tenant” into “VMs per subnet” and “subnets per tenant” is key to gain visibility for an appropriate network design for your cloud.
Using the “VMs per subnet” metric dramatically simplifies the question: can I constrain a subnet within a POD, or do I need individual subnets to span across PODs? Answering this question is much easier, because unlike the maximum number of VMs per tenant (which may not be known ahead of time), the maximum number of VMs per subnet is a known, fixed quantity, independent of the number of tenants and total size of each tenant.
IP subnet design and POD fragmentation
So, what’s the maximum number of VMs that should live in a subnet? Different people have different answers. Among other factors, the answer depends on their tolerance for broadcast noise; any extra VM in a subnet introduces a fixed extra amount of broadcast traffic that all VMs in the subnet are forced to consume. The chatter grows linearly, and it linearly consumes more and more CPU on each VM. There seems to be general consensus in putting the number of VMs per subnet in the 100 – 200 range, and definitely never to exceed 500 VMs per subnet. Let’s pick a number and say we can tolerate up to 250 VMs per subnet.
If the capacity of a POD is, say, 10,000 VMs, and the maximum size of a subnet is 250 VMs, the most difficult scenario is when less than 250 VMs remain available in the POD and a tenant asks for a subnet of exactly 250 VMs. While it’s true that you can’t satisfy the request because of a constraint you have introduced in your design, it’s also true that you hit the problem when less than 2.5% of residual capacity is available in your POD. When you have less than 2.5% capacity available, isn’t that a good time to create a new POD anyway?
What I have just described is nothing more than a manifestation of a fragmentation problem, which is common to any form of resource allocation. Fragmentation refers to having residual resources available, but scattered in such a way that they can’t always be consumed. Say you have 10 storage devices, 100GB each. You have stored files of different sizes on your devices, and now each individual device is left with less than 2.5GB available. In aggregate you have approximately 25GB of unused space, but the available capacity is fragmented across 10 different devices. If your next file is 5GB in size, it can’t be hosted on any individual device. While the residual capacity is not lost and you can still store smaller files, your 5GB file requires an extra device.
Moderate levels of fragmentation are typically tolerable, and cloud designs don’t have reasons to be different.
Hierarchical POD designs demand hierarchical network designs
Going back to the original question, can you constrain your Layer 2 networks to live within a POD? Or do you need Layer 2 networks to roam freely across PODs? While the answer still depends on your requirements, you have to think about it in terms of the right metrics. For your networking decisions, ignore the unpredictable “VMs per tenant” metric. Instead, tune the size of your PODs (number of VMs per POD) based on the maximum size of a subnet (VMs per subnet) and your tolerance for fragmentation (as a percentage of the total POD size). The great news is that none of these parameters depends on the characteristics of your tenant – leaving you in full control of your design – meaning you can fix the one parameter that’s most important to you, and play with the other two. Chances are you’ll easily find a good combo that’s going to allow you to constrain Layer 2 within a POD without sacrificing any flexibility in the networking options you offer your tenants. You’re much better off constraining Layer 2 in the POD and letting Layer 3 do what it does best, scale across PODs. After all, isn’t that what a hierarchical POD design is all about? Scale with hierarchies – flat environments don’t scale. That’s definitely true for network designs, too. Like the old networking adage goes “route when you can, bridge when you must.”