Platform, Security, Workplace
Platform, Security, Workplace
In part 1 we talked about to get a working network and in part 2 it was all about to make it secure. Now in Part 3 we will focus on Azure Load Balancing which distributing traffic intelligently across your resources is that no single server becomes a bottleneck. Using Azure Load Balancing means that a failed VM does not take down your applications and users in Tokyo get routed to Tokyo rather than unnecessarily traveling to West Europe and back.
In Azure there are four distinct load balancing services and picking the wrong one is one of the most common architectural mistakes in Azure deployments, they are not interchangeable. Each one operates at a different layer and solves a different problem and also comes with a different cost and complexity profile. Understanding these differences before you deploy the resource will save you a painful migration later on and a lot of explaining to a manager why the ‘simple load balancer’ you choose cannot actually do SSL termination.
In this part we cover all four services in depth and compare them directly so that it will give you a clear decision guide which one to pick for your specific scenario without guesswork. What you will learn in Part 3? All about Azure Load Balancer (Layer 4), Application Gateway (Layer 7), Azure Front Door (global Layer 7), Azure Traffic Manager (DNS-based routing), how they compare, and a practical decision guide for choosing between them.
In a traditional network, load balancing usually meant one thing; You put a F5 or a Citrix ADC in front of your servers and distributing connection across the pool and configured health checks, picked a distribution algorithm for example; Round robin, least connections or source IP affinity and moved on… This device handled both Layer 4 TCP distribution and Layer 7 HTTP routing, often in the same product.
In Azure these things are separated into dedicated services and each are optimised for a specific scenario, this is a better architecture. A Layer 4 load balancer can operate at extremely high throughput with microsecond latency because it does not try to do HTTP routing at the same time. This means however, that you need to understand which layer and which scenario applies before choosing a service. You have four options in Azure, which are;
Each of these services exists on a spectrum from low-level packet distribution to global DNS based traffic steering.
Azure Load Balancer is the foundational service as it operates at Layer 4 of the OSI model which means it distributes traffic based on IP addresses and TCP/UDP ports with no understanding of what is inside those packets. It does not read HTTP headers, it does not inspect URL’s and it also does not terminate TLS. It simply takes a connection that arrives at the frontend IP and distributes it to one of the backend pool members based on a five-tuple hash;
The result of this is a fast and scalable service. Azure LB can handle millions of flows simultaneously with sub-millisecond latency which makes it the right choice for any high throughput and non HTTP workload.
There are two flavours of the Azure Load balancer and the distinction matters for architecture design. A public load balancer has a frontend public IP and distributes inbound internet traffic across backend virtual machines. The classic use case for this type of load balancer is for a web tier facing the internet. And a internal load balancer has a frontend private IP within your VNet and distributes traffic between internal services and this is the right choice when you need to distribute traffic to an application tier or database cluster that should never be directly internet facing.
In a typical three tier application you might use a public load balancer in front of your web tier while using an internal load balancer between your web tier and application tier. Both are the same service, the only difference is wether the frontend IP is public or private.
Load Balancers continuously probes each backend pool member to determine whether the service is healthy and capable of receiving any traffic. when a probe fails, because for example a VM is down, unresponsive or the application has crashed then the load balancer stops sending new connection to that member until the probe succeeds again.
Health probes can be configured as TCP, just to check if the port is open or for HTTP to check if a specific URL returns a 200 code or HTTPS, same as HTTP but over TLS. For any real workload use a HTTP Or HTTPS probe against an application health endpoint then a raw TCP check as a web server can have a open TCP port while the application behind is completely broken.
There are two different SKU’s. Basic is free and limited in scale and has no SLA on its own, it’s effectively retired for new deployments. So the only option you have is the standard SKU and in my opinion thats a SKU you should always use, my point of view is; never use a basic service in Azure, I never do even for testing purposes I completely ditch basic services.
The standard SKU supports availability zones, has a 99,99% SLA and integrates with Azure monitor for detailed metrics, supports larger backend pools and is required for most enterprise architecture patterns. The difference is minimal relatieve to the VM costs it is load balancing.
The traditional equivalent of the Azure Load balancers is a F5 BIG-IP or Citrix ADC running in TCP pass-through mode or a classic hardware load balancer like A10 or Kemp. The key difference here is that it is fully managed, no appliance to patch, no capacity to pre-provision and it scaled automatically with your traffic.
Well, this is a function that is often overlooked with the use of an Azure Load Balancer for outbound connectivity to download updates, call external API’s or reach Azure services via public endpoints. The Load Balancer performs Source Network Address Translation also called SNAT, which is translating the VM’s private IP to one of the load balancer public IP’s. In high scale scenarios where SNAT port exhaustion becomes a real problem, each public IP provides approximately 64.000 SNAT ports and a busy backend pool can exhaust these quickly.
For high outbound traffic workloads it is wise to add either multiple public iP’s to the load balancer or better yet use a Azure NAT Gateway for outbound connectivity as this is specifically designed for this scenario and provides far more SNAT ports.
Where Azure Load Balancer stops at the transport layer, the Application Gateway starts at the application layer as it is a fully managed Layer 7 load balancer designed specifically for HTTP and HTTPS traffic. It understands the content of web requests in a way that Azure Load balancer simply cannot do.
Because Application Gateway terminates TLS and inspects the full HTTP request before forwarding it to a backend it can make routing decisions based on URL paths, hostnames, query strings and HTTP headers. This unlocks a range of capabilities that are essential for modern web architectures.
/api/* to one backend pool and requests to /images/* to another that are all behind a single public IP.app1.contoso.com goes to one backend; app2.contoso.com goes to another.
Application Gateway includes an optional Web Application Firewall that inspects inbound HTTP/S traffic for common web attack patterns. The Web Application Firewall is based on the OWASP Core Rule Set and protects against SQL injection and also cross-site scripting (XSS), command injection, path traversal and many other OWASP threats. For any public facing web application using a WAF should be considered the default rather than an optional extra layer. The cost of a WAF policy is very low compared to the cost of a breach.
WAF runs in two modes which is detection mode that logs threats but allows traffic which is useful during initial deployment to identity false positives and the second mode is prevention mode which blocks threats which is what you want in production. The best advice is to start in detection mode, review the logs and tune any false positives and when everything looks good then switch to prevention. Jumping straight to prevention mode on a legacy application without reviewing logs first will have your phone ringing within the hour..
Unlike Azure Front Door, which we will cover in the next section an Application Gateway is deployed inside your VNet in its own dedicated subnet. This means that it integrates naturally with your NSG’s, UDR’s and Private Endpoints. Backend pools can include VM IP addresses, VMSS instances, App Services or Private Endpoints and gives you a clean path to land balance traffic to backend that have no public IP addresses. For most single region web application, using an Application Gateway is the right choice for inbound HTTPS traffic.
The traditional equivalent of the Application Gateway is a F5 BIG-IP or Citrix ADC in full proxy mode which is a proper application delivery controller with SSL offload, content switching and WAF. The difference is as always that it is fully managed and scales without you have to touch anything.
Azure Front Door solves a different problem then the other two services we talked about. Where Azure Load Balancer and Application Gateway distribute traffic within a region, Front Door distributes traffic across regions and routing users to the nearest healthy backend across multiple Azure regions or across any internet accessible origin.
Front Door operates at the global network edge within Azure with points of presence in over 100 locations worldwide. When a user in Singapore makes a request to your application then Front Door edge node in Singapore handles the TLS handshake, inspects the request and then forwards it to the nearest healthy backend over the Microsoft backbone tractor then sending the traffic over the public internet the entire way. This alone can significantly reduce latency for globally distributed users.
| Capability | Application Gateway | Azure Front Door |
|---|---|---|
| Scope | Single region | Global, multi-region |
| TLS termination | ✓ (at gateway, inside VNet) | ✓ (at edge, globally) |
| WAF | ✓ | ✓ |
| URL path routing | ✓ | ✓ |
| CDN / caching | ✗ | ✓ |
| Global failover | ✗ | ✓ (automatic, seconds) |
| Anycast routing | ✗ | ✓ |
| Backend type | VNet resources only | Any internet-accessible origin |
| Deployed inside VNet | ✓ | ✗ (Microsoft-managed edge) |
Front Door has been consolidated a bit, and thats a good thing as now the classic SKU and the legacy CDN products have been merged into Front Door Standard and Premium SKU’s. The Standard SKU covers CDN, caching, WAF and global load balancing for most scenarios where Premium adds Private Link integration so that your backends can remain completely private even when served though Front Door and the premium SKU also adds Advanced WAF with bot protection and detailed security analytics.
If your backends are Private Endpoints (and it should for sensitive and critical workloads) or internal resources then premium is required. For public backends, standard is usually sufficient.
A common architecture mistake when using Front Door is forgetting to lock down the backends as Front Door sits in front of your Application Gateway or Web Servers but if those backends still have public IP’s then an attacker can bypass Front Door entirely by hitting them directly bypassing your WAF in the process. To prevent this then you have to restrict inbound traffic on your backends to a specific service tag called AzureFrontDoor.Backend or use Front Door Premium with Private Link backends so it has no public expose at all.
Check out the Microsoft guidance on securing Front Door for the recommended patterns.
The traditional equivalent for Front Door is a combination of Akamai or Cloudlare for CDN and Edge WAF with a global server load balancing solution like Akamai GTM or F5 DNS Load Balancer. The key difference is that it is deeply integrated with Azure’s private backbone which makes the path from edge to origin faster and more secure then routing over the public internet.
Azure Traffic Manager is the oldest of the earlier mentioned services and in some ways the simplest but it is frequently misunderstood. Traffic Manager is not ad load balancer in a traditional sense, it does not proxy traffic or terminate TLS, or inspect packets. Instead it operates purely at the DNS layer; when a client resolves your application’s DNS name then Traffic Manager returns the IP address of the most appropriate endpoint based on your configured routing method. The client then connects directly to that endpoint, traffic manager never sees the actual application traffic.
It is important to know this distinction and it matters allot because it means that Traffic Manager has no ability to inspect or modify traffic, has no WAF and cannot make routing decisions based on the content of requests. What it can do is steer the users to the right endpoint globally based on health, geography, performance or weighting with very low overhead and at DNS speed… and important at low cost! I use this method whenever I can as it is the simplest and most cost efficient solution.
| Routing Method | How It Works | Best For |
|---|---|---|
| Priority | Always sends to primary endpoint; fails over to secondary if primary is unhealthy | Active/passive disaster recovery |
| Weighted | Distributes traffic across endpoints according to assigned weights | Gradual traffic migration, A/B testing |
| Performance | Routes to the endpoint with the lowest latency for the client | Globally distributed apps, latency-sensitive workloads |
| Geographic | Routes based on the geographic location of the DNS query source | Data residency requirements, regional content |
| Multivalue | Returns multiple healthy endpoints in a single DNS response | Simple client-side load balancing |
| Subnet | Routes based on the IP subnet of the DNS query source | Routing office or ISP traffic to specific backends |
Because Traffic Manager works at DNS layer its failover speed is bounded by DNS TTL and the time-to-live value on the DNS response. by default, Traffic Manager sets a TTL of 300 seconds, if a endpoint fails then traffic manager will detect the failure quickly health checks run every 30 seconds by default, but clients that have already cached the DNS response will continue trying the failed endpoint until their cache expires. To increase the failover speed at the cost of increased DNS query volume you could reduce the TTL to 60 seconds or lower. For genuine active-passive disaster recovery with fast failover then Front Door is a better choice, it fails over in seconds rather then minutes because it is a proxy rather than a DNS redirect.
A common misconception is that Traffic Manager is often described as a ‘Global Load Balancer’ while it is not as it is a global traffic director. It does not balance load across endpoint simultaneously in a way that Application Gateway or Azure Load Balancers do. It points clients to an endpoint and then gets out of the way. If you need actual request-by-request load distribution regions, then you have to use Front Door.
Now we come to the part that should save you from the most common architectural mistakes. Read through the scenarios and pick the service that matches or combine them where needed which is more common then you might expect…
Start here:
| Scenario | Right Service | Why |
|---|---|---|
| Internal TCP load balancing between app and database tier | Azure Load Balancer (internal) | Layer 4, private, no HTTP needed |
| Public HTTPS web application, single region | Application Gateway + WAF | SSL termination, WAF, path routing |
| Non-HTTP workload (SQL, RDP, custom TCP) facing internet | Azure Load Balancer (public) | Layer 4 only, no HTTP awareness needed |
| Multi-region web app, users across the globe | Azure Front Door | Global edge, CDN, automatic failover |
| Active/passive DR failover between two regions | Azure Traffic Manager (priority) or Front Door | DNS failover for simple cases; Front Door for fast failover |
| Gradual migration from old to new deployment | Azure Traffic Manager (weighted) | Shift traffic percentage without DNS changes |
| Data residency — EU users must stay in EU | Azure Traffic Manager (geographic) | Routes by DNS query origin geography |
| Public web app behind Front Door with private backends | Front Door Premium + Application Gateway (private) | Edge WAF + private backend with no public IP |
| Microservices with path-based routing, internal only | Application Gateway (internal) or Azure Load Balancer | Depends on whether HTTP routing is needed |
In production environments, the answer is frequently not one service but two working together. The most common combinations are:
Cost note: Application Gateway and Front Door are priced on a combination of fixed hourly cost plus per-GB data processing and capacity units. For low-traffic development environments, the fixed cost of Application Gateway v2 can be disproportionate — in those cases, an Azure Load Balancer or even a simple reverse proxy on a VM may be more economical. For production, always use the right service rather than the cheapest one. See the Application Gateway pricing page and Front Door pricing page for current rates.
All four services compared side by side:
| Service | Layer | Scope | HTTP-aware | WAF | Use When |
|---|---|---|---|---|---|
| Azure Load Balancer | 4 (TCP/UDP) | Regional | ✗ | ✗ | Non-HTTP or high-throughput TCP/UDP workloads |
| Application Gateway | 7 (HTTP/S) | Regional | ✓ | ✓ | Single-region HTTPS apps needing SSL, routing, WAF |
| Azure Front Door | 7 (HTTP/S) | Global | ✓ | ✓ | Multi-region apps, global CDN, edge WAF |
| Traffic Manager | DNS | Global | ✗ | ✗ | DNS-based failover, geographic routing, traffic shifting |
Load balancing is one of those areas where the “just pick one and see” approach tends to end badly. The good news is that the decision framework is straightforward once you know what each service actually does — and now you do. In Part 4, we move on to hybrid connectivity and operations: the part where your carefully designed Azure network has to talk to the rest of the world, and you need the tools to know when it stops doing so.
Plan your load balancing layer before you deploy your first backend. Your future self, facing a migration from Load Balancer to Application Gateway at 3am, will be grateful.
This article is part of the Azure Networking series on larsschouwenaars.com. Read Part 1: VNets, Subnets, NSGs and Routing and Part 2: Azure Firewall, Private Endpoints and DNS Private Zones if you have not already.