Azure AI Foundry Cross-Region Private Endpoint Architecture
Executive Summary
This document outlines the network architecture for accessing GPT models deployed in Sweden Central from Germany West Central using private endpoints (cross-region private endpoint functionality), ensuring secure, compliant, and performant connectivity for enterprise AI workloads.
Cross-Region Private Endpoint Architecture
Customer Question:
"How does it work if we have the private endpoint in GWEC and want to use it for a model in Sweden?"
Answer: YES, this works with Azure's internal backbone routing!
How Cross-Region Private Endpoints Work:
- Private Endpoint Location: You create the private endpoint in YOUR region (Germany West Central)
- AI Foundry Hub Location: The AI Foundry Hub can be in ANY Azure region (Sweden Central)
- Azure Backbone Routing: Azure's internal network automatically routes traffic between regions
- No VNet Required in Target Region: You don't need a VNet in Sweden Central
Cross-Region Network Flow:
Your App (GWEC) → Private Endpoint (GWEC) → Azure Backbone → AI Foundry Hub (Sweden Central) → GPT Models
Frequently Asked Questions (FAQ)
Q: Do I need a VNet in Sweden Central to use models there with my private endpoint in Germany West Central?
A: NO! You only need:
- Private endpoint in YOUR region (Germany West Central)
- AI Foundry Hub in the target region (Sweden Central)
- Azure automatically handles the backbone routing between regions
Q: Is the connection still private when crossing regions?
A: YES! The entire path remains private:
- Your app → Private endpoint (private)
- Private endpoint → Azure backbone (private Microsoft network)
- Azure backbone → AI Foundry Hub (private)
- No traffic goes over public internet
Q: What's the performance difference between local and cross-region?
A: Performance Comparison:
- Local (GWEC to GWEC): <5ms latency
- Cross-region (GWEC to Sweden): ~15-25ms latency
- Both options provide excellent performance for most AI workloads
Q: Which approach should I choose?
A: Recommendations:
- For GPT-4o, O1, O3: Use local processing in Germany West Central (optimal performance)
- For GPT-5: Use cross-region to Sweden Central (when GPT-5 is required)
- Hybrid: Deploy both and route based on model requirements
⚠️ IMPORTANT LIMITATIONS AND EXCEPTIONS
Critical Limitation Found in Microsoft Documentation:
For Agent Service (Private Network Secured Environments):
- "All Foundry workspace resources must be deployed in the same region as the virtual network (VNet)"
- This includes: Cosmos DB, Storage Account, AI Search, Foundry Account, Project, Managed Identity, Azure OpenAI, or another Foundry resource used for model deployments
- Exception: This limitation applies specifically to Agent Service with private network isolation
For Standard AI Foundry Hub/Project:
- Cross-region private endpoints ARE supported ✅
- Private endpoint can be in different region than AI Foundry Hub
- Azure backbone routing works as described
Key Distinction:
- Standard AI Foundry: Cross-region private endpoints supported ✅
- Agent Service with Private Networks: Same-region requirement ⚠️
Recommendation:
- Verify with customer if they plan to use Agent Service or standard AI Foundry
- For standard workloads: Cross-region approach works
- For Agent Service: Consider regional deployment strategy
Architecture Overview
Detailed Network Components
1. Germany West Central Components
Customer Virtual Network (VNet)
- Address Space: 10.0.0.0/16 (customer-defined)
- Subnets:
- Application Subnet: 10.0.1.0/24
- Private Endpoint Subnet: 10.0.2.0/24
- Management Subnet: 10.0.3.0/24
Private Endpoint Configuration
Private Endpoint:
Name: pe-aifoundry-sweden-gwec
Location: Germany West Central
Subnet: 10.0.2.0/24
Target Resource: AI Foundry Hub (Sweden Central)
Target Sub-resource: amlworkspace
Private IP: 10.0.2.4
FQDN: aifoundry-sweden.privatelink.api.azureml.ms
Private DNS Zone
Private DNS Zone:
Zone Name: privatelink.api.azureml.ms
Linked VNets: [Customer-VNet-GWEC]
A Records:
- Name: aifoundry-sweden
IP: 10.0.2.4
- Name: aifoundry-sweden.westeurope
IP: 10.0.2.4
2. Sweden Central Components
AI Foundry Hub
AI Foundry Hub:
Name: aifoundry-sweden-hub
Location: Sweden Central
Resource Group: rg-aifoundry-sweden-prod
Public Network Access: Disabled
Identity: System Assigned Managed Identity
GPT-5 Model Deployment
GPT-5 Deployment:
Model: gpt-5 (2025-08-07)
Deployment Type: Standard (Regional)
Location: Sweden Central
Endpoint: https://aifoundry-sweden.openai.azure.com/
API Version: 2024-10-01-preview
Authentication: API Key + Azure AD
Network Flow Architecture
Security Architecture
Network Security Groups (NSG)
Application Subnet NSG
Inbound Rules:
- Name: Allow-HTTPS-Internal
Priority: 100
Source: 10.0.0.0/16
Destination: 10.0.1.0/24
Port: 443
Protocol: TCP
Action: Allow
Outbound Rules:
- Name: Allow-AI-Foundry
Priority: 100
Source: 10.0.1.0/24
Destination: PrivateEndpoint
Port: 443
Protocol: TCP
Action: Allow
Private Endpoint Subnet NSG
Inbound Rules:
- Name: Allow-Internal-HTTPS
Priority: 100
Source: 10.0.0.0/16
Destination: 10.0.2.0/24
Port: 443
Protocol: TCP
Action: Allow
Outbound Rules:
- Name: Allow-Azure-Backbone
Priority: 100
Source: 10.0.2.0/24
Destination: Internet
Port: 443
Protocol: TCP
Action: Allow
Authentication & Authorization
Authentication Methods:
1. Azure AD Service Principal:
- Client ID: <service-principal-id>
- Client Secret: <stored-in-key-vault>
- Tenant ID: <tenant-id>
2. Managed Identity:
- Type: System Assigned
- Scope: AI Foundry Hub Resource
3. API Key (Backup):
- Stored: Azure Key Vault
- Rotation: 90 days
RBAC Assignments:
- Role: Cognitive Services OpenAI User
- Principal: Uniper Application Service Principal
- Scope: AI Foundry Hub (Sweden Central)
Alternative Architecture Comparison
Option 1: Local Processing (Recommended for Performance)
Configuration: Everything in Germany West Central
- AI Foundry Hub: Germany West Central
- GPT Models: GPT-4o, GPT-4.1, O1, O3 (available locally)
- Private Endpoint: Germany West Central
- Benefits: <5ms latency, simplified setup, lower costs
- Use Case: Standard AI workloads, optimal performance
Option 2: Cross-Region for GPT-5 (When Specific Models Needed)
Configuration: Private endpoint in GWEC, AI Foundry in Sweden Central
- AI Foundry Hub: Sweden Central (for GPT-5 access)
- GPT Models: GPT-5 (only available in Sweden Central)
- Private Endpoint: Germany West Central (YOUR region)
- Azure Backbone Routing: Automatic cross-region connectivity
- Benefits: Access to GPT-5, still private connectivity, no Sweden VNet needed
- Latency: ~15-25ms (acceptable for most use cases)
- Use Case: When specific models not available locally
Cross-Region Architecture Diagram (Option 2)
Implementation Considerations
When to Use Cross-Region Architecture:
- Model Availability: Specific models only available in other regions
- Compliance Requirements: Need EU processing but local region lacks models
- Business Requirements: Specific model capabilities required
- Future-Proofing: Preparing for new model releases
When to Use Local Architecture:
- Performance Critical: Applications requiring <10ms latency
- Cost Optimization: Avoiding cross-region data transfer charges
- Simplicity: Reduced architectural complexity
- Available Models: Required models available locally
Data Residency & Compliance
Data Processing Locations
Data Flow:
Request Origin: Germany West Central
Network Transit: Azure Backbone (encrypted)
Processing Location: Sweden Central (EU)
Response Transit: Azure Backbone (encrypted)
Response Destination: Germany West Central
Compliance Framework:
- GDPR: ✅ EU-to-EU processing
- Data Residency: ✅ Sweden (EU member state)
- Encryption in Transit: ✅ TLS 1.3
- Encryption at Rest: ✅ Azure Storage Encryption
- Network Isolation: ✅ Private Endpoints
Performance Optimization
Latency Expectations
Network Latency Components:
GWEC to Sweden Central: ~15-25ms (updated based on Azure backbone)
Private Endpoint Overhead: ~2-5ms
AI Model Processing: ~500-2000ms
Total Expected Latency: ~520-2030ms
Performance Comparison:
Local Processing (GWEC): <5ms network latency
Cross-Region (GWEC to Sweden): ~15-25ms network latency
Difference: 10-20ms additional for cross-region
Optimization Strategies:
1. Connection Pooling: Reuse HTTPS connections
2. Request Batching: Combine multiple prompts
3. Async Processing: Non-blocking API calls
4. Caching: Cache responses where appropriate
Key Benefits of Cross-Region Approach:
- ✅ Single private endpoint in Germany West Central handles all connectivity
- ✅ Azure backbone provides secure, private routing cross-region
- ✅ No need to manage VNets in multiple regions
- ✅ Simplified network architecture
- ✅ Private connectivity maintained end-to-end
- ✅ Access to GPT-5 models not available in local region
Network Bandwidth
Bandwidth Allocation:
Expected Throughput: 1-10 Gbps
Request Size: 1KB - 10MB (typical)
Response Size: 1KB - 100KB (typical)
Concurrent Connections: 100-1000
Implementation Steps
Phase 1: Infrastructure Setup (Week 1)
- Create AI Foundry Hub in Sweden Central
# Create resource group
az group create --name rg-aifoundry-sweden-prod --location swedencentral
# Create AI Foundry Hub
az ml workspace create \
--name aifoundry-sweden-hub \
--resource-group rg-aifoundry-sweden-prod \
--location swedencentral \
--public-network-access Disabled
- Deploy GPT-5 Model
# Deploy GPT-5 model
az ml online-deployment create \
--name gpt5-deployment \
--model gpt-5:1 \
--workspace-name aifoundry-sweden-hub \
--resource-group rg-aifoundry-sweden-prod
Phase 2: Network Configuration (Week 2)
- Create Private Endpoint in GWEC
# Create private endpoint
az network private-endpoint create \
--name pe-aifoundry-sweden-gwec \
--resource-group rg-uniper-gwec-prod \
--vnet-name uniper-vnet-gwec \
--subnet private-endpoint-subnet \
--private-connection-resource-id "/subscriptions/<sub-id>/resourceGroups/rg-aifoundry-sweden-prod/providers/Microsoft.MachineLearningServices/workspaces/aifoundry-sweden-hub" \
--group-id amlworkspace \
--location germanywestcentral
- Configure Private DNS Zone
# Create private DNS zone
az network private-dns zone create \
--name privatelink.api.azureml.ms \
--resource-group rg-uniper-gwec-prod
# Link to VNet
az network private-dns link vnet create \
--name link-uniper-vnet \
--zone-name privatelink.api.azureml.ms \
--resource-group rg-uniper-gwec-prod \
--virtual-network uniper-vnet-gwec \
--registration-enabled false
Phase 3: Security Configuration (Week 3)
- Configure Network Security Groups
- Set up RBAC and Service Principal
- Configure Azure Key Vault for secrets
Phase 4: Testing & Validation (Week 4)
- Network connectivity testing
- Performance benchmarking
- Security validation
Monitoring & Observability
Network Monitoring
Metrics to Monitor:
- Private Endpoint Connection Status
- Network Latency (GWEC ↔ Sweden)
- Request/Response Throughput
- Failed Connection Attempts
- DNS Resolution Time
Alerting Thresholds:
- Latency > 3 seconds: Warning
- Connection Failures > 5%: Critical
- Private Endpoint Down: Critical
Application Monitoring
AI Foundry Metrics:
- Model Response Time
- Token Usage (Input/Output)
- Error Rates by HTTP Status
- Quota Utilization
- Model Availability
Logging Strategy:
- Application Logs: Log Analytics Workspace
- Network Logs: NSG Flow Logs
- AI Foundry Logs: Diagnostic Settings
Cost Optimization
Estimated Monthly Costs (EUR)
Network Components:
Private Endpoint: €7.50/month
DNS Zone: €0.50/month
Data Transfer (intra-EU): €0.02/GB
AI Foundry Components:
GPT-5 Usage: €0.03/1K tokens (input) + €0.06/1K tokens (output)
Compute Resources: Variable based on usage
Total Estimated (10M tokens/month): ~€500-800/month
Troubleshooting Guide
Common Issues & Solutions
-
DNS Resolution Failures
- Verify private DNS zone configuration
- Check VNet link association
- Validate A record entries
-
Connection Timeouts
- Review NSG rules
- Check private endpoint status
- Verify Azure Backbone connectivity
-
Authentication Errors
- Validate service principal permissions
- Check RBAC assignments
- Verify API key rotation
Security Considerations
Data Protection
- All traffic encrypted with TLS 1.3
- Private network isolation (no internet exposure)
- EU data processing compliance
- Regular security assessments
Access Control
- Principle of least privilege
- Role-based access control (RBAC)
- Regular access reviews
- Multi-factor authentication for management
Compliance & Governance
GDPR Compliance
✅ Data Processing Location: Sweden (EU member state) ✅ Data Controller: Uniper (Germany) ✅ Data Processor: Microsoft Azure (EU operations) ✅ Cross-border Transfer: EU-to-EU (Article 28 compliant)
Internal Governance
- Change management process
- Security review requirements
- Regular compliance audits
- Incident response procedures
This architecture provides a secure, compliant, and performant solution for accessing GPT-5 models in Sweden Central from Germany West Central while maintaining EU data residency requirements.