Platform Observability Scenarios for Large-Scale Azure Landing Zones

Document Purpose: Solutions to common enterprise challenges around platform-level monitoring, visibility into landing zone application teams, and Azure Service Health notifications.

Scenario 1: Platform Monitoring vs. Landing Zone Monitoring
Scenario 2: Visibility into 400+ Landing Zone Application Teams
- How It's Done: Sending Landing Zone Logs to Central Platform LAW
Scenario 3: Azure Service Health Dashboards & Notifications

Scenario 1: Platform Monitoring vs. Landing Zone Monitoring

The Challenge

"Monitoring for the platform itself and monitoring for the landing zone users is a separate thing. We want platform observability for our large platform - dashboards, alerting, and health monitoring."

Understanding the Distinction

Aspect	Platform Monitoring	Landing Zone (Workload) Monitoring
Scope	Central platform services (Identity, Connectivity, Management)	Individual application workloads
Owner	Central Platform Team	Landing Zone / Application Teams
Focus	Infrastructure health, shared services, governance	Application performance, business metrics
Subscription	Management, Connectivity, Identity subscriptions	Landing Zone subscriptions
Examples	ExpressRoute, Azure Firewall, Key Vault, DNS, LAW	VMs, App Services, Databases, Storage

Solution: Azure Monitor Baseline Alerts (AMBA)

Microsoft provides Azure Monitor Baseline Alerts (AMBA) - a policy-driven framework specifically designed for Azure Landing Zone platform monitoring.

Platform Services Covered by AMBA

Platform Component	Alert Types	Metrics Monitored
Azure ExpressRoute	Circuit availability, BGP status	BitsInPerSecond, BitsOutPerSecond
Azure Firewall	SNAT port utilization, health	FirewallHealth, ThroughputBitsPerSecond
Azure Virtual Network	N/A (activity-based)	DDoS alerts, NSG flow
Azure Virtual WAN	Hub health, tunnel status	TunnelBandwidth, BGPPeerStatus
Log Analytics Workspace	Ingestion, latency	IngestionVolumeMB, IngestionLatencyInSeconds
Azure Private DNS	Query volume, failures	QueryVolume, RecordSetCapacityUtilization
Azure Key Vault	Availability, saturation	Availability, SaturationShoebox
Azure Storage Account	Availability, latency	Availability, SuccessE2ELatency

Platform Dashboards for Large Organizations

For a large platform with many services, implement these visualization layers:

Recommended Workbooks for Platform Monitoring

Workbook	Purpose	Source
Activity Log Insights	Platform changes, who did what, admin actions	Built-in
Network Insights	ExpressRoute, VPN, Firewall, VNet health	Azure Portal
Key Vault Insights	Vault operations, access patterns, failures	Azure Portal
Storage Insights	Storage account health across subscriptions	Azure Portal
LAW Health	Workspace ingestion, query performance, cost	Custom/AMBA

Implementation Steps

Deploy AMBA via Policy

Portal → Azure Monitor → Baseline Alerts → Deploy to Management Group

Configure Action Groups (one per subscription minimum)
- Email notification channel
- Teams/Slack webhook for chat ops
- Logic App for ticketing integration
Create Platform Dashboard
- Pin critical metrics from each platform service
- Group by: Connectivity | Identity | Management
Set up Workbooks
- Use built-in Insights workbooks
- Create custom workbooks for organization-specific views

📖 Reference: Monitor Azure Platform Landing Zone Components

Scenario 2: Visibility into 400+ Landing Zone Application Teams

The Challenge

"The platform team should be aware of what's happening to application teams which they onboarded to the landing zones. With 400+ teams, how do we gain visibility without overwhelming the central infrastructure?"

The Challenge at Scale

Challenge	Impact
400+ teams = 400+ subscriptions	Cannot monitor each individually
Each team owns their workloads	Platform team shouldn't manage app-level alerts
Need visibility without control	See what's happening, not micromanage
Different maturity levels	Some teams are advanced, some need guidance

Solution Architecture: Federated Visibility Model

How It's Done: Sending Landing Zone Logs to Central Platform LAW

💡 Key Question: "If dashboards filter by Landing Zone, all landing zones must send their logs to the centralized platform Log Analytics Workspace. But how is this done?"

There are three methods to centralize logs from 400+ Landing Zone subscriptions:

Method 1: Azure Policy with Diagnostic Settings (Recommended)

Use DeployIfNotExists (DINE) policies to automatically configure diagnostic settings on all resources:

Policy Configuration:

{
  "if": {
    "field": "type",
    "equals": "Microsoft.Compute/virtualMachines"
  },
  "then": {
    "effect": "DeployIfNotExists",
    "details": {
      "type": "Microsoft.Insights/diagnosticSettings",
      "existenceCondition": {
        "field": "Microsoft.Insights/diagnosticSettings/workspaceId",
        "equals": "[parameters('centralLogAnalyticsWorkspaceId')]"
      },
      "roleDefinitionIds": [
        "/providers/Microsoft.Authorization/roleDefinitions/..."
      ],
      "deployment": {
        "properties": {
          "template": {
            // ARM template to create diagnostic setting
          }
        }
      }
    }
  }
}

Built-in Policies Available:

Resource Type	Policy Name
Virtual Machines	Configure Azure VMs to send logs to LAW
Storage Accounts	Configure Storage to send logs to LAW
Key Vault	Configure Key Vault to send logs to LAW
App Service	Configure App Service to send logs to LAW
SQL Database	Configure SQL to send logs to LAW
AKS	Configure AKS to send logs to LAW

Method 2: Data Collection Rules (DCR) with Azure Monitor Agent

For VMs and Arc-enabled servers, use Data Collection Rules:

DCR Assignment via Policy:

resource dcrAssignmentPolicy 'Microsoft.Authorization/policyAssignments@2022-06-01' = {
  name: 'assign-dcr-to-all-vms'
  properties: {
    policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/...'
    parameters: {
      dcrResourceId: {
        value: centralDCR.id
      }
    }
  }
  scope: managementGroup('LandingZones')
}

Method 3: Activity Log Export (Subscription-Level)

For Activity Logs (audit trail of who did what), configure at subscription level:

Azure CLI to Configure Activity Log Export:

# For each Landing Zone subscription
az monitor diagnostic-settings subscription create \
  --name "send-to-central-law" \
  --subscription "<lz-subscription-id>" \
  --workspace "<central-law-resource-id>" \
  --logs '[{"category": "Administrative", "enabled": true},
          {"category": "Security", "enabled": true},
          {"category": "ServiceHealth", "enabled": true},
          {"category": "Alert", "enabled": true},
          {"category": "Recommendation", "enabled": true},
          {"category": "Policy", "enabled": true},
          {"category": "Autoscale", "enabled": true},
          {"category": "ResourceHealth", "enabled": true}]'

Implementation Architecture: Complete Flow

Summary: How Landing Zone Logs Flow to Central LAW

Log Type	Source	Method	Destination Table
Activity Logs	Subscription	Subscription Diagnostic Settings	`AzureActivity`
Resource Logs	Azure Resources	Resource Diagnostic Settings (Policy)	`AzureDiagnostics`
VM/Guest Logs	VMs, Arc Servers	DCR + Azure Monitor Agent (Policy)	`Event`, `Syslog`, `Perf`
Custom App Logs	Applications	Application Insights / Custom DCR	Custom Tables

Required RBAC for Central LAW

For policies to write to the Central LAW, the policy identity needs:

Role	Scope	Purpose
`Log Analytics Contributor`	Central LAW	Write diagnostic settings
`Monitoring Contributor`	Landing Zones MG	Create diagnostic settings on resources

Strategy 1: Azure Resource Graph for Real-Time Visibility

Azure Resource Graph queries work across all subscriptions the platform team has access to:

// Count resources by Landing Zone (Resource Group)
resources
| summarize ResourceCount = count() by subscriptionId, resourceGroup, type
| order by ResourceCount desc

// Find VMs not sending heartbeat (unhealthy)
resources
| where type == "microsoft.compute/virtualmachines"
| join kind=leftouter (
    Heartbeat
    | where TimeGenerated > ago(5m)
    | summarize LastHeartbeat = max(TimeGenerated) by ResourceId = tolower(_ResourceId)
) on $left.id == $right.ResourceId
| where isempty(LastHeartbeat)
| project subscriptionId, resourceGroup, name, LastHeartbeat

Strategy 2: Cross-Subscription Workbooks

Create workbooks that query across all Landing Zone subscriptions:

Key Workbook Features:

Feature	How It Helps with 400 Teams
Subscription Parameter	Multi-select dropdown to filter views
Resource Graph Tiles	Query all resources without Log Analytics
Cross-Workspace Queries	Union data from multiple LAWs
Grouping by Tags	Filter by `CostCenter`, `Owner`, `Application`

Strategy 3: Centralized Alert View (Without Ownership)

The platform team can view alerts across all subscriptions without taking ownership:

Portal Location: Azure Monitor → Alerts → "All Alerts" (filter by subscription scope)

Strategy 4: Activity Log Insights Workbook

Track what teams are doing in their subscriptions:

Activity Category	What It Shows
Administrative	Resource create/delete/modify
Security	RBAC changes, Key Vault access
Policy	Compliance evaluations, remediation
Alert	Alert rule modifications

Implementation: Cross-Subscription Visibility Dashboard

Dashboard Components:
  Row 1 - Overview:
    - Total Subscriptions: 400
    - Total Resources: ARG count
    - Active Alerts: Count by severity
    - Unhealthy Resources: Resource Health API
    
  Row 2 - By Team:
    - Table: Subscription | Owner | Resources | Alerts | Health %
    - Filter: Search by team name or tag
    
  Row 3 - Trends:
    - Alert trend (7 days)
    - Resource growth trend
    - Cost trend by team
    
  Row 4 - Action Items:
    - Non-compliant resources
    - Resources without diagnostics
    - VMs without Azure Monitor Agent

KQL: Cross-Workspace Query Pattern

// Query alerts across all Landing Zone workspaces
let workspaces = dynamic([
    "workspace('LZ-Team1-LAW').AlertsManagementResources",
    "workspace('LZ-Team2-LAW').AlertsManagementResources"
]);
union withsource=SourceWorkspace *
| where SourceWorkspace in (workspaces)
| summarize AlertCount = count() by SourceWorkspace, Severity
| order by AlertCount desc

Governance: What Platform Team Should Track (Not Own)

Metric	Why Track	Action if Threshold Breached
AMA Adoption	Ensure all VMs have agent	Remind teams, provide docs
Diagnostics Enabled	All resources sending logs	Policy remediation
Alert Coverage	Teams have alerts configured	Training session
Cost Anomalies	Unexpected spikes	Notify team owner

📖 Reference: Azure Monitor Enterprise Monitoring Architecture

Scenario 3: Azure Service Health Dashboards & Notifications

The Challenge

"How can the platform team get notified quickly when something happens on the Azure side - like Azure AD outages or Azure Front Door problems? We need service health dashboards and proactive notifications."

Understanding Azure Service Health

Azure Service Health provides three types of health information:

Health Type	What It Covers	Example
Service Issues	Active incidents affecting Azure services	"Azure Front Door - Increased latency in West Europe"
Planned Maintenance	Scheduled updates that may impact services	"Azure AD - Authentication upgrade on Jan 15"
Health Advisories	Issues requiring action but not outages	"Deprecation of TLS 1.0 for Storage"
Security Advisories	Security-related events	"Critical vulnerability in Azure Service X"
Resource Health	Health of YOUR specific resources	"Your VM vm-prod-001 is Unavailable"

Solution 1: Create Service Health Alerts

Step-by-step in Azure Portal:

Navigate to Service Health

Azure Portal → Service Health → Health Alerts → Add Service Health Alert

Configure Scope
- Select subscription(s)
- Select affected services (e.g., Azure Front Door, Azure AD, Azure Firewall)
- Select regions
Configure Conditions

Event Type Recommended
Service Issue ✅ Enable
Planned Maintenance ✅ Enable
Health Advisory ✅ Enable
Security Advisory ✅ Enable
Configure Action Group
- Email: Platform Team DL
- SMS: On-call numbers
- Webhook: Teams/Slack/PagerDuty

Event Type	Recommended
Service Issue	✅ Enable
Planned Maintenance	✅ Enable
Health Advisory	✅ Enable
Security Advisory	✅ Enable

Important: Service Health Alert Limitations

⚠️ Service Health alerts do NOT support Alert Processing Rules. You must configure the Action Group directly on the alert rule.

Solution 2: Deploy Service Health Alerts at Scale with AMBA

Use Azure Monitor Baseline Alerts (AMBA) to deploy Service Health alerts across all subscriptions via Policy:

AMBA Service Health Initiative includes alerts for:

Service Issues
Planned Maintenance
Health Advisories
Security Advisories

📖 Reference: Deploy Service Health Alerts at Scale

Solution 3: Service Health Dashboard (Workbook)

Create a Service Health Dashboard using Azure Workbooks:

Dashboard Sections:
  Section 1 - Current Status:
    - Active Service Issues (count)
    - Planned Maintenance (next 7 days)
    - Health Advisories (unread)
    
  Section 2 - By Service (Filter):
    - Azure AD / Entra ID
    - Azure Front Door
    - Azure Firewall
    - ExpressRoute
    - Azure Monitor
    
  Section 3 - Historical:
    - Service issues (last 90 days)
    - Impact by region
    - MTTR trends
    
  Section 4 - Resource Health:
    - Your resources current health
    - Resources with degraded status

KQL: Query Service Health Events

// Service Health events from Activity Log
AzureActivity
| where CategoryValue == "ServiceHealth"
| extend ServiceHealthEvent = parse_json(Properties)
| extend 
    EventType = tostring(ServiceHealthEvent.incidentType),
    Title = tostring(ServiceHealthEvent.title),
    Service = tostring(ServiceHealthEvent.impactedServices),
    Region = tostring(ServiceHealthEvent.impactedRegions),
    Status = tostring(ServiceHealthEvent.status)
| where TimeGenerated > ago(30d)
| project TimeGenerated, EventType, Title, Service, Region, Status
| order by TimeGenerated desc

Solution 4: Proactive Notification Channels

Channel	Use Case	Setup
Email	Standard notification	Action Group → Email
SMS	Critical on-call alerts	Action Group → SMS
Voice Call	Sev 0 outages	Action Group → Voice
Teams/Slack	ChatOps integration	Action Group → Webhook
PagerDuty/ServiceNow	Ticket integration	Action Group → Webhook/Logic App
Azure Mobile App	Push notifications	Enable in app settings

Bicep: Deploy Service Health Alert

resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
  name: 'ag-service-health-platform'
  location: 'Global'
  properties: {
    groupShortName: 'SvcHealth'
    enabled: true
    emailReceivers: [
      {
        name: 'Platform Team'
        emailAddress: 'platform-team@company.com'
        useCommonAlertSchema: true
      }
    ]
  }
}

resource serviceHealthAlert 'Microsoft.Insights/activityLogAlerts@2020-10-01' = {
  name: 'alert-service-health-all'
  location: 'Global'
  properties: {
    enabled: true
    scopes: [
      subscription().id
    ]
    condition: {
      allOf: [
        {
          field: 'category'
          equals: 'ServiceHealth'
        }
        {
          field: 'properties.incidentType'
          equals: 'Incident'  // or Maintenance, Informational, Security
        }
      ]
    }
    actions: {
      actionGroups: [
        {
          actionGroupId: actionGroup.id
        }
      ]
    }
  }
}

Recommended Service Health Alert Configuration

Service Category	Alert For	Priority
Identity	Azure AD / Entra ID, MFA	Critical
Networking	Front Door, Azure Firewall, ExpressRoute, VPN Gateway	Critical
Compute	Virtual Machines, VMSS, AKS	High
Storage	Storage Accounts, Azure Files	High
Monitoring	Azure Monitor, Log Analytics	High
Security	Key Vault, Defender for Cloud	Critical
Data	SQL, Cosmos DB	High

Quick Setup Checklist

Create Action Group in Global region
Add email, SMS, and webhook receivers
Create Service Health Alert Rule
Select all critical services and regions
Enable all event types (Issue, Maintenance, Advisory, Security)
Test by viewing historical events
Document escalation process for Sev 0

📖 Reference: Create Service Health Alerts in Azure Portal

Summary: Three Scenarios Answered

Scenario	Solution	Key Feature
Platform Monitoring	AMBA + Platform Dashboards	Policy-driven alerts for platform services
400 Teams Visibility	Cross-subscription Workbooks + Azure Resource Graph	See without owning
Service Health Notifications	Service Health Alerts + Action Groups	Proactive Azure outage notifications

Additional Resources

Resource	Link
Azure Monitor Baseline Alerts (AMBA)	https://aka.ms/amba
AMBA Deployment Guide	https://azure.github.io/azure-monitor-baseline-alerts/
Service Health Documentation	https://learn.microsoft.com/en-us/azure/service-health/
Azure Monitor Enterprise Architecture	https://learn.microsoft.com/en-us/azure/azure-monitor/fundamentals/enterprise-monitoring-architecture
Cross-Workspace Queries	https://learn.microsoft.com/en-us/azure/azure-monitor/logs/cross-workspace-query

Document Version: 1.0
Last Updated: January 2026
Author: Azure Platform Observability Guide

Table of Contents​

Scenario 1: Platform Monitoring vs. Landing Zone Monitoring​

The Challenge​

Understanding the Distinction​

Solution: Azure Monitor Baseline Alerts (AMBA)​

Platform Services Covered by AMBA​

Platform Dashboards for Large Organizations​

Recommended Workbooks for Platform Monitoring​

Implementation Steps​

Scenario 2: Visibility into 400+ Landing Zone Application Teams​

The Challenge​

The Challenge at Scale​

Solution Architecture: Federated Visibility Model​

How It's Done: Sending Landing Zone Logs to Central Platform LAW​

Method 1: Azure Policy with Diagnostic Settings (Recommended)​

Method 2: Data Collection Rules (DCR) with Azure Monitor Agent​

Method 3: Activity Log Export (Subscription-Level)​

Implementation Architecture: Complete Flow​

Summary: How Landing Zone Logs Flow to Central LAW​

Required RBAC for Central LAW​

Strategy 1: Azure Resource Graph for Real-Time Visibility​

Strategy 2: Cross-Subscription Workbooks​

Strategy 3: Centralized Alert View (Without Ownership)​

Strategy 4: Activity Log Insights Workbook​

Implementation: Cross-Subscription Visibility Dashboard​

KQL: Cross-Workspace Query Pattern​

Governance: What Platform Team Should Track (Not Own)​

Scenario 3: Azure Service Health Dashboards & Notifications​

The Challenge​

Understanding Azure Service Health​

Solution 1: Create Service Health Alerts​

Important: Service Health Alert Limitations​

Solution 2: Deploy Service Health Alerts at Scale with AMBA​

Solution 3: Service Health Dashboard (Workbook)​

KQL: Query Service Health Events​

Solution 4: Proactive Notification Channels​

Bicep: Deploy Service Health Alert​

Recommended Service Health Alert Configuration​

Quick Setup Checklist​

Summary: Three Scenarios Answered​

Additional Resources​

Table of Contents

Scenario 1: Platform Monitoring vs. Landing Zone Monitoring

The Challenge

Understanding the Distinction

Solution: Azure Monitor Baseline Alerts (AMBA)

Platform Services Covered by AMBA

Platform Dashboards for Large Organizations

Recommended Workbooks for Platform Monitoring

Implementation Steps

Scenario 2: Visibility into 400+ Landing Zone Application Teams

The Challenge

The Challenge at Scale

Solution Architecture: Federated Visibility Model

How It's Done: Sending Landing Zone Logs to Central Platform LAW

Method 1: Azure Policy with Diagnostic Settings (Recommended)

Method 2: Data Collection Rules (DCR) with Azure Monitor Agent

Method 3: Activity Log Export (Subscription-Level)

Implementation Architecture: Complete Flow

Summary: How Landing Zone Logs Flow to Central LAW

Required RBAC for Central LAW

Strategy 1: Azure Resource Graph for Real-Time Visibility

Strategy 2: Cross-Subscription Workbooks

Strategy 3: Centralized Alert View (Without Ownership)

Strategy 4: Activity Log Insights Workbook

Implementation: Cross-Subscription Visibility Dashboard

KQL: Cross-Workspace Query Pattern

Governance: What Platform Team Should Track (Not Own)

Scenario 3: Azure Service Health Dashboards & Notifications

The Challenge

Understanding Azure Service Health

Solution 1: Create Service Health Alerts

Important: Service Health Alert Limitations

Solution 2: Deploy Service Health Alerts at Scale with AMBA

Solution 3: Service Health Dashboard (Workbook)

KQL: Query Service Health Events

Solution 4: Proactive Notification Channels

Bicep: Deploy Service Health Alert

Recommended Service Health Alert Configuration

Quick Setup Checklist

Summary: Three Scenarios Answered

Additional Resources