Skip to main content

18 - Capacity Planning

Right-sizing APIM units, traffic estimation, and scaling strategies


🎯 Capacity Fundamentals

What is a "Unit"?

A unit is the base scale factor for APIM. Each unit provides:

TierRequests/sec (approx)CacheNotes
Developer~50010 MBNo SLA, no scale
Basic~1,00010 MBCannot scale
Standard~2,50050 MBCan add units
Premium~4,0005 GBZone redundant possible
Basic v2~5,000N/AVNet supported
Standard v2~10,000N/AHigher throughput

⚠️ These are approximations. Actual throughput depends on policies, payload size, and backend latency.


📊 Capacity Metric

The Capacity metric is your primary scaling indicator:

Capacity RangeStatusAction
0-40%UnderutilizedConsider scaling in (non-prod)
40-70%OptimalMonitor normally
70-80%WarningPlan to scale out
80-100%CriticalScale out immediately

📏 Sizing Methodology

Step 1: Gather Requirements

MetricHow to MeasureExample
Peak RPSMax requests per second5,000 RPS
Average RPSNormal traffic1,000 RPS
Payload SizeAverage request/response10 KB
Backend LatencyP95 backend response200 ms
Policy ComplexityNumber of policies5-10 policies
Growth Projection12-month growth50%

Step 2: Calculate Base Units

Formula:

Base Units = Peak RPS / Tier Throughput per Unit

Example (Premium tier):

Peak RPS: 10,000
Premium throughput: ~4,000 RPS per unit
Base Units = 10,000 / 4,000 = 2.5 → Round up to 3 units

Step 3: Apply Adjustments

FactorMultiplierWhen to Apply
Complex policies1.2-1.5xJWT validation, transforms
Large payloads (>50 KB)1.3-1.5xFile uploads, large responses
High backend latency (>1s)1.2xSlow backends hold connections
Zone redundancyMinimum 3Premium tier HA
Growth buffer1.3-1.5x12-month projection

Example with adjustments:

Base: 3 units
Complex policies: 3 × 1.3 = 3.9
Zone redundancy: Minimum 3
Growth buffer: 3.9 × 1.3 = 5.07 → Round to 5 units
Final: 5 Premium units

📈 Traffic Pattern Analysis

Understand Your Patterns

PatternCharacteristicsScaling Strategy
SteadyConsistent loadFixed units
CyclicalPredictable peaksSchedule-based autoscale
SpikyUnpredictable burstsMetric-based autoscale + buffer
GrowingContinuous increaseProactive scaling + monitoring

Traffic Analysis Query

// Analyze traffic patterns
ApiManagementGatewayLogs
| where TimeGenerated > ago(7d)
| summarize Requests=count() by bin(TimeGenerated, 1h)
| summarize
Min=min(Requests),
Avg=avg(Requests),
P95=percentile(Requests, 95),
Max=max(Requests)

⚙️ Autoscaling Configuration

Metric-Based Autoscaling (Bicep)

resource autoscale 'Microsoft.Insights/autoscalesettings@2022-10-01' = {
name: 'apim-autoscale'
location: location
properties: {
enabled: true
targetResourceUri: apim.id
profiles: [
{
name: 'Default'
capacity: {
minimum: '3' // Zone redundancy minimum
maximum: '10' // Cost cap
default: '3'
}
rules: [
// Scale OUT when capacity > 70%
{
metricTrigger: {
metricName: 'Capacity'
metricResourceUri: apim.id
timeGrain: 'PT1M'
statistic: 'Average'
timeWindow: 'PT5M'
timeAggregation: 'Average'
operator: 'GreaterThan'
threshold: 70
}
scaleAction: {
direction: 'Increase'
type: 'ChangeCount'
value: '1'
cooldown: 'PT10M'
}
}
// Scale IN when capacity < 30%
{
metricTrigger: {
metricName: 'Capacity'
metricResourceUri: apim.id
timeGrain: 'PT1M'
statistic: 'Average'
timeWindow: 'PT30M' // Longer window for scale-in
timeAggregation: 'Average'
operator: 'LessThan'
threshold: 30
}
scaleAction: {
direction: 'Decrease'
type: 'ChangeCount'
value: '1'
cooldown: 'PT60M' // Longer cooldown for scale-in
}
}
]
}
]
}
}

Schedule-Based Autoscaling

// Additional profile for known peak hours
profiles: [
{
name: 'BusinessHours'
capacity: {
minimum: '5'
maximum: '10'
default: '5'
}
recurrence: {
frequency: 'Week'
schedule: {
timeZone: 'W. Europe Standard Time'
days: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
hours: [8]
minutes: [0]
}
}
rules: [] // Same rules as default
}
{
name: 'OffHours'
capacity: {
minimum: '3'
maximum: '5'
default: '3'
}
recurrence: {
frequency: 'Week'
schedule: {
timeZone: 'W. Europe Standard Time'
days: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
hours: [20]
minutes: [0]
}
}
rules: []
}
]

💰 Cost-Capacity Tradeoffs

Premium Tier Pricing (Approx)

ConfigurationUnitsEst. Monthly CostUse Case
Minimum HA2~$5,600Small production
Zone Redundant3~$8,400Standard production
High Availability4~$11,200Important workloads
High Throughput6~$16,800High-traffic APIs
Enterprise10~$28,000Large scale

Optimization Strategies

StrategyPotential SavingsImplementation
Right-size for actual traffic20-40%Monitor and adjust
Autoscale down off-hours15-25%Schedule-based scaling
Caching10-30%Reduce backend calls
Reserved instances30-50%Commit to 1-3 years
Use v2 tiersVariesEvaluate Basic/Standard v2

📊 Load Testing

Azure Load Testing Integration

# load-test.yaml
version: v0.1
testName: APIM Load Test
testPlan: apim-load-test.jmx
description: Validate APIM capacity under load
engineInstances: 5
failureCriteria:
- avg(response_time_ms) > 500
- percentage(error) > 5

Key Metrics to Capture

MetricTargetConcern Threshold
P50 Response Time< 100ms> 200ms
P95 Response Time< 500ms> 1000ms
P99 Response Time< 1000ms> 2000ms
Error Rate< 0.1%> 1%
ThroughputTarget RPS< 80% target
Capacity< 70%> 80%

Pre-Production Load Test Checklist

  • Test with production-like payload sizes
  • Test with production-like backend latency
  • Run sustained load (30+ minutes)
  • Test burst scenarios (3x normal)
  • Verify autoscaling triggers
  • Capture baseline metrics
  • Test during scale-out operation

📋 Capacity Planning Checklist

Initial Sizing

  • Peak RPS estimated
  • Payload sizes documented
  • Backend latencies measured
  • Policy complexity assessed
  • Growth projection calculated
  • Base units calculated
  • Adjustments applied
  • Zone redundancy considered

Ongoing Management

  • Capacity alerts configured
  • Autoscaling rules defined
  • Monthly capacity review scheduled
  • Quarterly right-sizing review
  • Load testing before major releases
  • Budget alerts configured

🔢 Quick Reference Tables

Units by Traffic

Daily RequestsAvg RPSRecommended Units (Premium)
1M~122-3
5M~583
10M~1163-4
50M~5804-5
100M~1,1605-6
500M~5,8008-10

Scaling Times

ActionApproximate Time
Add 1 unit15-30 minutes
Remove 1 unit15-30 minutes
Update from 2→5 units30-45 minutes

DocumentDescription
09-Cost-OptimizationCost strategies
02-ReliabilityHA requirements
12-TradeoffsCapacity vs cost

Back to: README - Main documentation index

📖Learn