Skip to main content

02 - Reliability

High availability, failover strategies, and health monitoring for Azure Front Door

WAF


🎯 Reliability Design Principles

PrincipleFront Door Implementation
Design for failureMulti-origin groups, health probes, automatic failover
Reduce single points of failureGlobal anycast, multiple POPs, redundant origins
Test recovery proceduresSimulate origin failures, validate failover
Monitor health continuouslyHealth probes, alerts, diagnostic logs

✅ Design Checklist

#RecommendationPriority
1Deploy origins in multiple regions🔴 Critical
2Configure health probes for all origins🔴 Critical
3Use appropriate routing method (Active-Active or Active-Passive)🔴 Critical
4Set request timeouts appropriately🟡 High
5Use same hostname on Front Door and origins🟡 High
6Disable session affinity for high reliability🟡 High
7Enable caching to serve during origin outages🟢 Medium
8Consider redundant traffic management for mission-critical🟢 Medium

📊 Health Probes

How Health Probes Work

Health Probe Configuration

resource originGroup 'Microsoft.Cdn/profiles/originGroups@2023-05-01' = {
name: 'og-api'
parent: frontDoor
properties: {
healthProbeSettings: {
probePath: '/health' // Custom health endpoint
probeProtocol: 'Https' // Match your origin protocol
probeRequestType: 'HEAD' // HEAD is lighter than GET
probeIntervalInSeconds: 30 // Balance between detection speed and load
}
loadBalancingSettings: {
sampleSize: 4 // Number of samples to evaluate
successfulSamplesRequired: 3 // Healthy if 3 of 4 succeed
additionalLatencyInMilliseconds: 50 // Latency sensitivity
}
}
}

Health Probe Best Practices

SettingRecommendationReason
Probe Path/health or /api/healthDedicated endpoint that checks dependencies
Request TypeHEADLess overhead than GET
Interval30 seconds (default)Lower = faster detection but more load
Sample Size4Avoid false positives from transient failures
Success Required3 of 475% success rate = healthy

🔄 Routing for Reliability

Active-Active Configuration

// Origin 1: West Europe
resource origin1 'Microsoft.Cdn/profiles/originGroups/origins@2023-05-01' = {
name: 'origin-westeurope'
parent: originGroup
properties: {
hostName: 'app-westeurope.azurewebsites.net'
priority: 1 // Same priority = active-active
weight: 1000 // Equal weight = equal distribution
enabledState: 'Enabled'
}
}

// Origin 2: East US
resource origin2 'Microsoft.Cdn/profiles/originGroups/origins@2023-05-01' = {
name: 'origin-eastus'
parent: originGroup
properties: {
hostName: 'app-eastus.azurewebsites.net'
priority: 1 // Same priority
weight: 1000 // Equal weight
enabledState: 'Enabled'
}
}

Active-Passive Configuration

// Primary Origin
resource originPrimary 'Microsoft.Cdn/profiles/originGroups/origins@2023-05-01' = {
name: 'origin-primary'
parent: originGroup
properties: {
hostName: 'app-primary.azurewebsites.net'
priority: 1 // Primary
weight: 1000
enabledState: 'Enabled'
}
}

// Backup Origin
resource originBackup 'Microsoft.Cdn/profiles/originGroups/origins@2023-05-01' = {
name: 'origin-backup'
parent: originGroup
properties: {
hostName: 'app-backup.azurewebsites.net'
priority: 2 // Only used when priority 1 fails
weight: 1000
enabledState: 'Enabled'
}
}

⏱️ Timeouts

Request Timeout Configuration

resource route 'Microsoft.Cdn/profiles/afdEndpoints/routes@2023-05-01' = {
name: 'default-route'
parent: endpoint
properties: {
originGroup: { id: originGroup.id }
forwardingProtocol: 'HttpsOnly'
// Default origin response timeout: 60 seconds
// Adjust based on your origin's response time
}
}

Timeout Recommendations

ScenarioTimeoutNotes
API endpoints30-60 secondsDefault is usually fine
File downloads120+ secondsIncrease for large files
Long-running operationsAvoidUse async patterns instead

⚠️ Warning: Long timeouts consume resources. Prefer async patterns for long-running operations.


🏷️ Host Name Preservation

Problem: Host Name Mismatch

Issues:

  • Cookies set for wrong domain
  • Redirects point to internal hostname
  • OAuth callbacks fail

Solution: Preserve Host Header

resource origin 'Microsoft.Cdn/profiles/originGroups/origins@2023-05-01' = {
name: 'origin-app'
parent: originGroup
properties: {
hostName: 'app.azurewebsites.net'
originHostHeader: 'api.contoso.com' // Send original host header
// ...
}
}

🚫 Session Affinity

Reliability Impact

With Session AffinityWithout Session Affinity
User "stuck" to one originRequests distributed across origins
Origin failure = user disruptionSeamless failover
Uneven load distributionEven load distribution
Not recommendedRecommended

Recommendation

resource originGroup 'Microsoft.Cdn/profiles/originGroups@2023-05-01' = {
name: 'og-api'
parent: frontDoor
properties: {
sessionAffinityState: 'Disabled' // Recommended for reliability
// ...
}
}

💡 If you need session affinity: Design your application to handle graceful session recovery when the origin changes.


🛡️ Caching for Reliability

Caching provides reliability benefits beyond performance:

Benefits:

  • Serve cached content during origin outages
  • Reduce load on origins (fewer requests to fail)
  • Absorb traffic spikes

🔁 Redundant Traffic Management (Mission-Critical)

For mission-critical workloads, consider redundant global load balancers:

⚠️ Consider carefully: This adds significant complexity and cost. Only for workloads with near-zero tolerance for downtime.

ScenarioRecommendation
Standard web appSingle Front Door instance
Mission-critical (99.99%+)Consider redundant CDN
Content delivery onlyGlobal content delivery pattern

📊 Configuration Recommendations Summary

RecommendationBenefit
Multiple origins in origin groupsRedundancy and automatic failover
Configure health probesDetect unhealthy origins
Use HEAD requests for probesLess overhead on origins
Set appropriate timeoutsPrevent resource exhaustion
Preserve host namesAvoid cookie/redirect issues
Disable session affinityImprove failover reliability
Enable cachingServe content during outages

🔗 References

ResourceLink
WAF Reliability ChecklistDesign review checklist
Health ProbesFront Door health probes
Routing MethodsTraffic routing methods
Host Name PreservationBest practices

Previous: 01 - Architecture Overview | Next: 03 - Security

📖Learn