Skip to main content

Platform Observability Scenarios for Large-Scale Azure Landing Zones

Document Purpose: Solutions to common enterprise challenges around platform-level monitoring, visibility into landing zone application teams, and Azure Service Health notifications.


Table of Contents

  1. Scenario 1: Platform Monitoring vs. Landing Zone Monitoring
  2. Scenario 2: Visibility into 400+ Landing Zone Application Teams
  3. Scenario 3: Azure Service Health Dashboards & Notifications

Scenario 1: Platform Monitoring vs. Landing Zone Monitoring

The Challenge

"Monitoring for the platform itself and monitoring for the landing zone users is a separate thing. We want platform observability for our large platform - dashboards, alerting, and health monitoring."

Understanding the Distinction

AspectPlatform MonitoringLanding Zone (Workload) Monitoring
ScopeCentral platform services (Identity, Connectivity, Management)Individual application workloads
OwnerCentral Platform TeamLanding Zone / Application Teams
FocusInfrastructure health, shared services, governanceApplication performance, business metrics
SubscriptionManagement, Connectivity, Identity subscriptionsLanding Zone subscriptions
ExamplesExpressRoute, Azure Firewall, Key Vault, DNS, LAWVMs, App Services, Databases, Storage

Solution: Azure Monitor Baseline Alerts (AMBA)

Microsoft provides Azure Monitor Baseline Alerts (AMBA) - a policy-driven framework specifically designed for Azure Landing Zone platform monitoring.

Platform Services Covered by AMBA

Platform ComponentAlert TypesMetrics Monitored
Azure ExpressRouteCircuit availability, BGP statusBitsInPerSecond, BitsOutPerSecond
Azure FirewallSNAT port utilization, healthFirewallHealth, ThroughputBitsPerSecond
Azure Virtual NetworkN/A (activity-based)DDoS alerts, NSG flow
Azure Virtual WANHub health, tunnel statusTunnelBandwidth, BGPPeerStatus
Log Analytics WorkspaceIngestion, latencyIngestionVolumeMB, IngestionLatencyInSeconds
Azure Private DNSQuery volume, failuresQueryVolume, RecordSetCapacityUtilization
Azure Key VaultAvailability, saturationAvailability, SaturationShoebox
Azure Storage AccountAvailability, latencyAvailability, SuccessE2ELatency

Platform Dashboards for Large Organizations

For a large platform with many services, implement these visualization layers:

WorkbookPurposeSource
Activity Log InsightsPlatform changes, who did what, admin actionsBuilt-in
Network InsightsExpressRoute, VPN, Firewall, VNet healthAzure Portal
Key Vault InsightsVault operations, access patterns, failuresAzure Portal
Storage InsightsStorage account health across subscriptionsAzure Portal
LAW HealthWorkspace ingestion, query performance, costCustom/AMBA

Implementation Steps

  1. Deploy AMBA via Policy

    Portal → Azure Monitor → Baseline Alerts → Deploy to Management Group
  2. Configure Action Groups (one per subscription minimum)

    • Email notification channel
    • Teams/Slack webhook for chat ops
    • Logic App for ticketing integration
  3. Create Platform Dashboard

    • Pin critical metrics from each platform service
    • Group by: Connectivity | Identity | Management
  4. Set up Workbooks

    • Use built-in Insights workbooks
    • Create custom workbooks for organization-specific views

📖 Reference: Monitor Azure Platform Landing Zone Components


Scenario 2: Visibility into 400+ Landing Zone Application Teams

The Challenge

"The platform team should be aware of what's happening to application teams which they onboarded to the landing zones. With 400+ teams, how do we gain visibility without overwhelming the central infrastructure?"

The Challenge at Scale

ChallengeImpact
400+ teams = 400+ subscriptionsCannot monitor each individually
Each team owns their workloadsPlatform team shouldn't manage app-level alerts
Need visibility without controlSee what's happening, not micromanage
Different maturity levelsSome teams are advanced, some need guidance

Solution Architecture: Federated Visibility Model

How It's Done: Sending Landing Zone Logs to Central Platform LAW

💡 Key Question: "If dashboards filter by Landing Zone, all landing zones must send their logs to the centralized platform Log Analytics Workspace. But how is this done?"

There are three methods to centralize logs from 400+ Landing Zone subscriptions:

Use DeployIfNotExists (DINE) policies to automatically configure diagnostic settings on all resources:

Policy Configuration:

{
"if": {
"field": "type",
"equals": "Microsoft.Compute/virtualMachines"
},
"then": {
"effect": "DeployIfNotExists",
"details": {
"type": "Microsoft.Insights/diagnosticSettings",
"existenceCondition": {
"field": "Microsoft.Insights/diagnosticSettings/workspaceId",
"equals": "[parameters('centralLogAnalyticsWorkspaceId')]"
},
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/..."
],
"deployment": {
"properties": {
"template": {
// ARM template to create diagnostic setting
}
}
}
}
}
}

Built-in Policies Available:

Resource TypePolicy Name
Virtual MachinesConfigure Azure VMs to send logs to LAW
Storage AccountsConfigure Storage to send logs to LAW
Key VaultConfigure Key Vault to send logs to LAW
App ServiceConfigure App Service to send logs to LAW
SQL DatabaseConfigure SQL to send logs to LAW
AKSConfigure AKS to send logs to LAW

Method 2: Data Collection Rules (DCR) with Azure Monitor Agent

For VMs and Arc-enabled servers, use Data Collection Rules:

DCR Assignment via Policy:

resource dcrAssignmentPolicy 'Microsoft.Authorization/policyAssignments@2022-06-01' = {
name: 'assign-dcr-to-all-vms'
properties: {
policyDefinitionId: '/providers/Microsoft.Authorization/policyDefinitions/...'
parameters: {
dcrResourceId: {
value: centralDCR.id
}
}
}
scope: managementGroup('LandingZones')
}

Method 3: Activity Log Export (Subscription-Level)

For Activity Logs (audit trail of who did what), configure at subscription level:

Azure CLI to Configure Activity Log Export:

# For each Landing Zone subscription
az monitor diagnostic-settings subscription create \
--name "send-to-central-law" \
--subscription "<lz-subscription-id>" \
--workspace "<central-law-resource-id>" \
--logs '[{"category": "Administrative", "enabled": true},
{"category": "Security", "enabled": true},
{"category": "ServiceHealth", "enabled": true},
{"category": "Alert", "enabled": true},
{"category": "Recommendation", "enabled": true},
{"category": "Policy", "enabled": true},
{"category": "Autoscale", "enabled": true},
{"category": "ResourceHealth", "enabled": true}]'

Implementation Architecture: Complete Flow

Summary: How Landing Zone Logs Flow to Central LAW

Log TypeSourceMethodDestination Table
Activity LogsSubscriptionSubscription Diagnostic SettingsAzureActivity
Resource LogsAzure ResourcesResource Diagnostic Settings (Policy)AzureDiagnostics
VM/Guest LogsVMs, Arc ServersDCR + Azure Monitor Agent (Policy)Event, Syslog, Perf
Custom App LogsApplicationsApplication Insights / Custom DCRCustom Tables

Required RBAC for Central LAW

For policies to write to the Central LAW, the policy identity needs:

RoleScopePurpose
Log Analytics ContributorCentral LAWWrite diagnostic settings
Monitoring ContributorLanding Zones MGCreate diagnostic settings on resources

Strategy 1: Azure Resource Graph for Real-Time Visibility

Azure Resource Graph queries work across all subscriptions the platform team has access to:

// Count resources by Landing Zone (Resource Group)
resources
| summarize ResourceCount = count() by subscriptionId, resourceGroup, type
| order by ResourceCount desc

// Find VMs not sending heartbeat (unhealthy)
resources
| where type == "microsoft.compute/virtualmachines"
| join kind=leftouter (
Heartbeat
| where TimeGenerated > ago(5m)
| summarize LastHeartbeat = max(TimeGenerated) by ResourceId = tolower(_ResourceId)
) on $left.id == $right.ResourceId
| where isempty(LastHeartbeat)
| project subscriptionId, resourceGroup, name, LastHeartbeat

Strategy 2: Cross-Subscription Workbooks

Create workbooks that query across all Landing Zone subscriptions:

Key Workbook Features:

FeatureHow It Helps with 400 Teams
Subscription ParameterMulti-select dropdown to filter views
Resource Graph TilesQuery all resources without Log Analytics
Cross-Workspace QueriesUnion data from multiple LAWs
Grouping by TagsFilter by CostCenter, Owner, Application

Strategy 3: Centralized Alert View (Without Ownership)

The platform team can view alerts across all subscriptions without taking ownership:

Portal Location: Azure Monitor → Alerts → "All Alerts" (filter by subscription scope)

Strategy 4: Activity Log Insights Workbook

Track what teams are doing in their subscriptions:

Activity CategoryWhat It Shows
AdministrativeResource create/delete/modify
SecurityRBAC changes, Key Vault access
PolicyCompliance evaluations, remediation
AlertAlert rule modifications

Implementation: Cross-Subscription Visibility Dashboard

Dashboard Components:
Row 1 - Overview:
- Total Subscriptions: 400
- Total Resources: ARG count
- Active Alerts: Count by severity
- Unhealthy Resources: Resource Health API

Row 2 - By Team:
- Table: Subscription | Owner | Resources | Alerts | Health %
- Filter: Search by team name or tag

Row 3 - Trends:
- Alert trend (7 days)
- Resource growth trend
- Cost trend by team

Row 4 - Action Items:
- Non-compliant resources
- Resources without diagnostics
- VMs without Azure Monitor Agent

KQL: Cross-Workspace Query Pattern

// Query alerts across all Landing Zone workspaces
let workspaces = dynamic([
"workspace('LZ-Team1-LAW').AlertsManagementResources",
"workspace('LZ-Team2-LAW').AlertsManagementResources"
]);
union withsource=SourceWorkspace *
| where SourceWorkspace in (workspaces)
| summarize AlertCount = count() by SourceWorkspace, Severity
| order by AlertCount desc

Governance: What Platform Team Should Track (Not Own)

MetricWhy TrackAction if Threshold Breached
AMA AdoptionEnsure all VMs have agentRemind teams, provide docs
Diagnostics EnabledAll resources sending logsPolicy remediation
Alert CoverageTeams have alerts configuredTraining session
Cost AnomaliesUnexpected spikesNotify team owner

📖 Reference: Azure Monitor Enterprise Monitoring Architecture


Scenario 3: Azure Service Health Dashboards & Notifications

The Challenge

"How can the platform team get notified quickly when something happens on the Azure side - like Azure AD outages or Azure Front Door problems? We need service health dashboards and proactive notifications."

Understanding Azure Service Health

Azure Service Health provides three types of health information:

Health TypeWhat It CoversExample
Service IssuesActive incidents affecting Azure services"Azure Front Door - Increased latency in West Europe"
Planned MaintenanceScheduled updates that may impact services"Azure AD - Authentication upgrade on Jan 15"
Health AdvisoriesIssues requiring action but not outages"Deprecation of TLS 1.0 for Storage"
Security AdvisoriesSecurity-related events"Critical vulnerability in Azure Service X"
Resource HealthHealth of YOUR specific resources"Your VM vm-prod-001 is Unavailable"

Solution 1: Create Service Health Alerts

Step-by-step in Azure Portal:

  1. Navigate to Service Health

    Azure Portal → Service Health → Health Alerts → Add Service Health Alert
  2. Configure Scope

    • Select subscription(s)
    • Select affected services (e.g., Azure Front Door, Azure AD, Azure Firewall)
    • Select regions
  3. Configure Conditions

    Event TypeRecommended
    Service Issue✅ Enable
    Planned Maintenance✅ Enable
    Health Advisory✅ Enable
    Security Advisory✅ Enable
  4. Configure Action Group

    • Email: Platform Team DL
    • SMS: On-call numbers
    • Webhook: Teams/Slack/PagerDuty

Important: Service Health Alert Limitations

⚠️ Service Health alerts do NOT support Alert Processing Rules. You must configure the Action Group directly on the alert rule.

Solution 2: Deploy Service Health Alerts at Scale with AMBA

Use Azure Monitor Baseline Alerts (AMBA) to deploy Service Health alerts across all subscriptions via Policy:

AMBA Service Health Initiative includes alerts for:

  • Service Issues
  • Planned Maintenance
  • Health Advisories
  • Security Advisories

📖 Reference: Deploy Service Health Alerts at Scale

Solution 3: Service Health Dashboard (Workbook)

Create a Service Health Dashboard using Azure Workbooks:

Dashboard Sections:
Section 1 - Current Status:
- Active Service Issues (count)
- Planned Maintenance (next 7 days)
- Health Advisories (unread)

Section 2 - By Service (Filter):
- Azure AD / Entra ID
- Azure Front Door
- Azure Firewall
- ExpressRoute
- Azure Monitor

Section 3 - Historical:
- Service issues (last 90 days)
- Impact by region
- MTTR trends

Section 4 - Resource Health:
- Your resources current health
- Resources with degraded status

KQL: Query Service Health Events

// Service Health events from Activity Log
AzureActivity
| where CategoryValue == "ServiceHealth"
| extend ServiceHealthEvent = parse_json(Properties)
| extend
EventType = tostring(ServiceHealthEvent.incidentType),
Title = tostring(ServiceHealthEvent.title),
Service = tostring(ServiceHealthEvent.impactedServices),
Region = tostring(ServiceHealthEvent.impactedRegions),
Status = tostring(ServiceHealthEvent.status)
| where TimeGenerated > ago(30d)
| project TimeGenerated, EventType, Title, Service, Region, Status
| order by TimeGenerated desc

Solution 4: Proactive Notification Channels

ChannelUse CaseSetup
EmailStandard notificationAction Group → Email
SMSCritical on-call alertsAction Group → SMS
Voice CallSev 0 outagesAction Group → Voice
Teams/SlackChatOps integrationAction Group → Webhook
PagerDuty/ServiceNowTicket integrationAction Group → Webhook/Logic App
Azure Mobile AppPush notificationsEnable in app settings

Bicep: Deploy Service Health Alert

resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
name: 'ag-service-health-platform'
location: 'Global'
properties: {
groupShortName: 'SvcHealth'
enabled: true
emailReceivers: [
{
name: 'Platform Team'
emailAddress: 'platform-team@company.com'
useCommonAlertSchema: true
}
]
}
}

resource serviceHealthAlert 'Microsoft.Insights/activityLogAlerts@2020-10-01' = {
name: 'alert-service-health-all'
location: 'Global'
properties: {
enabled: true
scopes: [
subscription().id
]
condition: {
allOf: [
{
field: 'category'
equals: 'ServiceHealth'
}
{
field: 'properties.incidentType'
equals: 'Incident' // or Maintenance, Informational, Security
}
]
}
actions: {
actionGroups: [
{
actionGroupId: actionGroup.id
}
]
}
}
}
Service CategoryAlert ForPriority
IdentityAzure AD / Entra ID, MFACritical
NetworkingFront Door, Azure Firewall, ExpressRoute, VPN GatewayCritical
ComputeVirtual Machines, VMSS, AKSHigh
StorageStorage Accounts, Azure FilesHigh
MonitoringAzure Monitor, Log AnalyticsHigh
SecurityKey Vault, Defender for CloudCritical
DataSQL, Cosmos DBHigh

Quick Setup Checklist

  • Create Action Group in Global region
  • Add email, SMS, and webhook receivers
  • Create Service Health Alert Rule
  • Select all critical services and regions
  • Enable all event types (Issue, Maintenance, Advisory, Security)
  • Test by viewing historical events
  • Document escalation process for Sev 0

📖 Reference: Create Service Health Alerts in Azure Portal


Summary: Three Scenarios Answered

ScenarioSolutionKey Feature
Platform MonitoringAMBA + Platform DashboardsPolicy-driven alerts for platform services
400 Teams VisibilityCross-subscription Workbooks + Azure Resource GraphSee without owning
Service Health NotificationsService Health Alerts + Action GroupsProactive Azure outage notifications

Additional Resources

ResourceLink
Azure Monitor Baseline Alerts (AMBA)https://aka.ms/amba
AMBA Deployment Guidehttps://azure.github.io/azure-monitor-baseline-alerts/
Service Health Documentationhttps://learn.microsoft.com/en-us/azure/service-health/
Azure Monitor Enterprise Architecturehttps://learn.microsoft.com/en-us/azure/azure-monitor/fundamentals/enterprise-monitoring-architecture
Cross-Workspace Querieshttps://learn.microsoft.com/en-us/azure/azure-monitor/logs/cross-workspace-query

Document Version: 1.0
Last Updated: January 2026
Author: Azure Platform Observability Guide

📖Learn