10 - Performance Efficiency

Caching strategies, autoscaling, and latency optimization

📋 WAF Workload Design Checklist

Based on Azure Well-Architected Framework - Performance Efficiency

#	Recommendation	Status
✅	(Service) Define performance targets: capacity, CPU, memory, request duration, throughput	☐
✅	(Service) Dynamically scale to match demand with autoscale rules	☐
✅	(Service) Collect performance data using built-in analytics, Azure Monitor, App Insights	☐
✅	(Service) Test performance under production conditions with load testing	☐
✅	(Service) Review documented limits and constraints for APIM tier	☐
✅	(API) Minimize expensive processing (large payloads, WebSockets) with validate-content	☐
✅	(API) Evaluate caching policies or external cache for performance improvement	☐
✅	(API) Consider Azure Front Door / App Gateway for TLS offloading	☐
✅	(Service & API) Evaluate logic placement impact between gateway, backend, and entry point	☐
✅	(Service & API) Collect: request processing time, resource usage, throughput, cache hit ratio	☐

⚡ Validate-Content for Large Payloads

WAF Recommendation: Minimize expensive processing with validate-content policy

<!-- Validate and limit large request bodies -->
<inbound>
    <base />
    <validate-content 
        unspecified-content-type-action="prevent"
        max-size="102400"
        size-exceeded-action="prevent"
        errors-variable-name="validationErrors">
        <content type="application/json" validate-as="json" action="prevent" />
        <content type="application/xml" validate-as="xml" action="prevent" />
    </validate-content>
    
    <!-- Return friendly error for validation failures -->
    <choose>
        <when condition="@(context.Variables.ContainsKey("validationErrors"))">
            <return-response>
                <set-status code="400" reason="Bad Request" />
                <set-body>@{
                    var errors = (List<string>)context.Variables["validationErrors"];
                    return new JObject(
                        new JProperty("error", "Validation failed"),
                        new JProperty("details", new JArray(errors))
                    ).ToString();
                }</set-body>
            </return-response>
        </when>
    </choose>
</inbound>

Payload Size Limits

Tier	Max Request Size	Recommendation
Developer	256 KB	For testing only
Basic	256 KB	Small payloads
Standard	256 KB	Small payloads
Premium	256 KB	Use validate-content
v2 tiers	2 MB	Larger payloads supported

📍 Logic Placement Optimization

WAF Recommendation: Evaluate performance impact of logic placement

Logic	Best Location	Rationale
TLS Termination	Front Door/App Gateway	Offload from APIM
Geo-Routing	Front Door	Edge processing
Response Caching	APIM (built-in)	Reduce backend load
Request Validation	APIM	Protect backends
Business Logic	Backend	Avoid gateway bloat
Complex Transforms	Backend	Better compute resources

🎯 Performance Targets

Metric	Target	Critical
P50 Latency	< 100ms	< 500ms
P95 Latency	< 200ms	< 1000ms
P99 Latency	< 500ms	< 2000ms
Cache Hit Rate	> 70%	> 50%
Availability	99.95%	99.9%

🚀 Response Caching

Basic Response Caching

<policies>
    <inbound>
        <base />
        <!-- Check cache first -->
        <cache-lookup vary-by-developer="false"
                     vary-by-developer-groups="false"
                     caching-type="internal"
                     downstream-caching-type="public"
                     must-revalidate="true" />
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
        <!-- Store successful responses -->
        <choose>
            <when condition="@(context.Response.StatusCode == 200)">
                <cache-store duration="3600" />
            </when>
        </choose>
        <!-- Add cache headers -->
        <set-header name="X-Cache" exists-action="override">
            <value>@(context.Response.Headers.GetValueOrDefault("X-Cache", "MISS"))</value>
        </set-header>
    </outbound>
</policies>

Vary Cache by Query Parameters

<cache-lookup vary-by-query-parameter="version,region,lang"
              vary-by-developer="false"
              caching-type="internal" />

Vary Cache by Headers

<cache-lookup vary-by-header="Accept,Accept-Language"
              vary-by-developer="false"
              caching-type="internal" />

🔴 External Redis Cache

Architecture with Redis

Redis Cache Configuration (Bicep)

// From customer scenario: modules/api-management-core.bicep
resource apim 'Microsoft.ApiManagement/service@2023-05-01-preview' = {
  name: apimName
  location: location
  properties: {
    // ... other config
  }
}

resource redisCache 'Microsoft.ApiManagement/service/caches@2023-05-01-preview' = {
  name: 'redis-external'
  parent: apim
  properties: {
    connectionString: 'redis-host.redis.cache.windows.net:6380,password=xxx,ssl=True,abortConnect=False'
    useFromLocation: 'default'
    description: 'External Redis cache for high-performance caching'
    resourceId: 'https://management.azure.com/subscriptions/${subscription().subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Cache/redis/redis-apim-cache'
  }
}

External Cache Policy

<policies>
    <inbound>
        <cache-lookup vary-by-developer="false"
                     caching-type="external"
                     downstream-caching-type="public" />
    </inbound>
    <outbound>
        <cache-store duration="600" caching-type="external" />
    </outbound>
</policies>

⚡ Backend Performance

Connection Pooling

<backend>
    <!-- Enable HTTP/2 for better performance -->
    <forward-request timeout="30" 
                    buffer-request-body="true"
                    buffer-response="false" />
</backend>

Circuit Breaker Pattern

<backend>
    <retry condition="@(context.Response.StatusCode >= 500)"
           count="3" 
           interval="1"
           delta="2"
           max-interval="10"
           first-fast-retry="true" />
</backend>

Timeout Configuration

<backend>
    <forward-request timeout="30" follow-redirects="true" />
</backend>

<!-- Different timeouts per operation -->
<inbound>
    <choose>
        <when condition="@(context.Operation.Id == "heavy-report")">
            <set-backend-service timeout="120" />
        </when>
        <otherwise>
            <set-backend-service timeout="30" />
        </otherwise>
    </choose>
</inbound>

📊 Request/Response Optimization

Compression

<outbound>
    <!-- Compress large responses -->
    <choose>
        <when condition="@(context.Response.Body.As<string>(preserveContent: true).Length > 1024)">
            <set-header name="Content-Encoding" exists-action="override">
                <value>gzip</value>
            </set-header>
        </when>
    </choose>
</outbound>

Minimize Payload

<outbound>
    <!-- Remove unnecessary fields -->
    <set-body>@{
        var body = context.Response.Body.As<JObject>();
        body.Remove("internalId");
        body.Remove("debugInfo");
        body.Remove("metadata");
        return body.ToString();
    }</set-body>
</outbound>

Conditional GET (ETag)

<outbound>
    <set-header name="ETag" exists-action="override">
        <value>@{
            var body = context.Response.Body.As<string>(preserveContent: true);
            using (var md5 = System.Security.Cryptography.MD5.Create())
            {
                var hash = md5.ComputeHash(System.Text.Encoding.UTF8.GetBytes(body));
                return "\"" + BitConverter.ToString(hash).Replace("-", "").ToLower() + "\"";
            }
        }</value>
    </set-header>
</outbound>

<inbound>
    <!-- Check If-None-Match -->
    <choose>
        <when condition="@(context.Request.Headers.GetValueOrDefault("If-None-Match", "") == context.Variables.GetValueOrDefault<string>("cachedETag"))">
            <return-response>
                <set-status code="304" reason="Not Modified" />
            </return-response>
        </when>
    </choose>
</inbound>

📈 Autoscaling Configuration

Capacity-Based Scaling

resource autoscale 'Microsoft.Insights/autoscalesettings@2022-10-01' = {
  name: 'apim-autoscale-performance'
  location: location
  properties: {
    enabled: true
    targetResourceUri: apim.id
    profiles: [
      {
        name: 'Performance Profile'
        capacity: {
          default: '2'
          minimum: '2'
          maximum: '10'
        }
        rules: [
          // Scale OUT on high capacity
          {
            metricTrigger: {
              metricName: 'Capacity'
              metricResourceUri: apim.id
              timeGrain: 'PT1M'
              statistic: 'Average'
              timeWindow: 'PT5M'
              timeAggregation: 'Average'
              operator: 'GreaterThan'
              threshold: 70
            }
            scaleAction: {
              direction: 'Increase'
              type: 'ChangeCount'
              value: '1'
              cooldown: 'PT10M'
            }
          }
          // Scale IN when low
          {
            metricTrigger: {
              metricName: 'Capacity'
              metricResourceUri: apim.id
              timeGrain: 'PT1M'
              statistic: 'Average'
              timeWindow: 'PT30M'
              timeAggregation: 'Average'
              operator: 'LessThan'
              threshold: 30
            }
            scaleAction: {
              direction: 'Decrease'
              type: 'ChangeCount'
              value: '1'
              cooldown: 'PT30M'
            }
          }
        ]
      }
    ]
  }
}

🌐 Geographic Distribution

Multi-Region Deployment

Traffic Manager Configuration

resource trafficManager 'Microsoft.Network/trafficmanagerprofiles@2022-04-01' = {
  name: 'tm-apim-global'
  location: 'global'
  properties: {
    profileStatus: 'Enabled'
    trafficRoutingMethod: 'Performance'  // Route to closest
    dnsConfig: {
      relativeName: 'api-global'
      ttl: 60
    }
    monitorConfig: {
      protocol: 'HTTPS'
      port: 443
      path: '/status-0123456789abcdef'
      intervalInSeconds: 30
      timeoutInSeconds: 10
      toleratedNumberOfFailures: 3
    }
    endpoints: [
      {
        name: 'westeurope'
        type: 'Microsoft.Network/trafficManagerProfiles/azureEndpoints'
        properties: {
          targetResourceId: apimWestEurope.id
          endpointStatus: 'Enabled'
          priority: 1
        }
      }
      {
        name: 'eastus'
        type: 'Microsoft.Network/trafficManagerProfiles/azureEndpoints'
        properties: {
          targetResourceId: apimEastUS.id
          endpointStatus: 'Enabled'
          priority: 2
        }
      }
    ]
  }
}

📊 Performance Monitoring

Key Performance Metrics

Metric	Alert Threshold	Action
`BackendDuration`	> 2000ms	Scale backend
`Duration`	> 3000ms	Investigate
`Capacity`	> 70%	Auto-scale
`FailedRequests`	> 5%	Alert team

KQL Query for Latency Analysis

ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| summarize 
    P50 = percentile(TotalTime, 50),
    P95 = percentile(TotalTime, 95),
    P99 = percentile(TotalTime, 99),
    Avg = avg(TotalTime),
    Count = count()
    by bin(TimeGenerated, 5m), OperationId
| order by P95 desc
| render timechart

Performance Dashboard Query

// Backend vs Gateway latency breakdown
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| extend 
    GatewayLatency = TotalTime - BackendTime,
    CacheHit = ResponseHeaders contains "X-Cache: HIT"
| summarize 
    AvgGateway = avg(GatewayLatency),
    AvgBackend = avg(BackendTime),
    CacheHitRate = countif(CacheHit) * 100.0 / count()
    by bin(TimeGenerated, 5m)
| render timechart

✅ Performance Checklist

Caching

Response caching enabled
Cache vary parameters configured
External Redis for high volume
Cache hit rate monitored

Backend

Connection timeouts set
Retry policies configured
Circuit breaker implemented
HTTP/2 enabled

Scaling

Autoscaling configured
Capacity alerts set
Multi-region deployed
Traffic Manager configured

Monitoring

Latency dashboards created
P95/P99 alerts configured
Backend latency tracked
Cache hit rate monitored

Document	Description
04-Policies	Caching policies
06-Monitoring	Metrics setup
09-Cost-Optimization	Scaling costs

Next: 11-Monetization - Products, subscriptions, and billing

📋 WAF Workload Design Checklist​

⚡ Validate-Content for Large Payloads​

Payload Size Limits​

📍 Logic Placement Optimization​

🎯 Performance Targets​

🚀 Response Caching​

Basic Response Caching​

Vary Cache by Query Parameters​

Vary Cache by Headers​

🔴 External Redis Cache​

Architecture with Redis​

Redis Cache Configuration (Bicep)​

External Cache Policy​

⚡ Backend Performance​

Connection Pooling​

Circuit Breaker Pattern​

Timeout Configuration​

📊 Request/Response Optimization​

Compression​

Minimize Payload​

Conditional GET (ETag)​

📈 Autoscaling Configuration​

Capacity-Based Scaling​

🌐 Geographic Distribution​

Multi-Region Deployment​

Traffic Manager Configuration​

📊 Performance Monitoring​

Key Performance Metrics​

KQL Query for Latency Analysis​

Performance Dashboard Query​

✅ Performance Checklist​

Caching​

Backend​

Scaling​

Monitoring​

🔗 Related Documents​

📋 WAF Workload Design Checklist

⚡ Validate-Content for Large Payloads

Payload Size Limits

📍 Logic Placement Optimization

🎯 Performance Targets

🚀 Response Caching

Basic Response Caching

Vary Cache by Query Parameters

Vary Cache by Headers

🔴 External Redis Cache

Architecture with Redis

Redis Cache Configuration (Bicep)

External Cache Policy

⚡ Backend Performance

Connection Pooling

Circuit Breaker Pattern

Timeout Configuration

📊 Request/Response Optimization

Compression

Minimize Payload

Conditional GET (ETag)

📈 Autoscaling Configuration

Capacity-Based Scaling

🌐 Geographic Distribution

Multi-Region Deployment

Traffic Manager Configuration

📊 Performance Monitoring

Key Performance Metrics

KQL Query for Latency Analysis

Performance Dashboard Query

✅ Performance Checklist

Caching

Backend

Scaling

Monitoring

🔗 Related Documents