Background
Several years of Azure usage, one subscription, no real structure. Services had been added as needs arose, environments shared the same space, and the security controls that existed were more conceptual than enforced.
The initial ask was a security review. But once the scope of what needed to change became clear, it became something more fundamental: migrate the entire environment to a new subscription structure and rebuild the security model from the ground up. Both things had to happen without taking anything offline.
Anil Choudhary led the work from architecture through to operational handover.
The Challenge
Five interconnected structural problems. None of them could be addressed incrementally without the others getting in the way.
Fragmented, Co-Located Architecture
All workloads — development, testing, and production — were deployed within a single Azure subscription with no logical or policy-based separation:
- No centralized ingress or routing layer; individual services were exposed independently
- Dev and production workloads sharing the same network space, increasing blast radius for any misconfiguration
- Loose service coupling with undocumented dependencies — making migration planning complex
- No repeatable deployment pattern; every resource had been provisioned ad hoc
This architecture could not scale and could not be governed effectively in its current form.
Pervasive Security Exposure
The security posture was critically weak:
- No Web Application Firewall — application endpoints received unfiltered internet traffic
- No centralized network firewall — outbound and lateral traffic were uninspected
- No private endpoints — PaaS services communicated over public internet paths
- No secrets management — credentials and connection strings embedded in application configurations
- No identity-first security model — access was largely based on shared credentials with no MFA enforcement
For an organization handling sensitive enterprise workflows, the exposure wasn't theoretical. It was active and ongoing.
Network and Communication Inefficiencies
Service-to-service communication traversed the public internet by default:
- Increased latency on internal API calls
- No VNet integration for App Services or PaaS resources
- No private DNS resolution — services resolved over public endpoints
- No subnet segmentation to contain traffic between application tiers
Absent Governance
There was no operational governance framework:
- Resources were named inconsistently, making ownership and cost attribution impossible
- No mandatory tagging — environment, owner, and cost centre were untracked
- No Azure Policy enforcement — any user with contributor access could deploy anything
- No access review process — permissions had accumulated over time without audit
Migration Complexity
The requirement to migrate the existing platform to a new subscription structure introduced significant execution risk:
- Tight dependencies between services that had never been mapped
- No export/import playbook for the services in use
- Risk of data loss during database migration
- DNS and certificate dependencies requiring coordinated cutover
- No rollback plan existed
Architecture Design
The target architecture was designed around three principles: eliminate implicit trust at every layer, separate concerns cleanly across subscription boundaries, and make the platform reproducible through Infrastructure as Code.
Multi-Subscription Landing Zone
The new landing zone organized workloads into purpose-built subscriptions:
Tenant Root Group
Platform
Connectivity ← Hub VNet, Azure Firewall, DNS, Bastion
Identity ← Azure AD, Conditional Access, PIM
Management ← Log Analytics, Defender for Cloud, Sentinel
Workload
Development ← Isolated Dev environment
QA / Testing ← Pre-production validation
Production ← Live workloads, hardened policies
Azure Policy initiatives were assigned at the management group level with deny effects for critical controls and deployIfNotExists for monitoring and tagging. No workload subscription can operate outside these guardrails.
Zero Trust Security Model
The architecture was designed to verify every request, enforce least privilege, and assume breach at every layer.
Perimeter Layer
Azure Application Gateway with WAF (OWASP Core Rule Set 3.2) sits in front of all application traffic:
- SSL termination at the gateway — no plain HTTP reaches backend services
- Custom WAF rules for application-specific threat patterns
- All traffic logged to Log Analytics for analysis
Network Layer
Internet
│
▼
Application Gateway + WAF (ingress, Layer 7)
│
▼
Azure Firewall (egress + east-west inspection)
│
├── App Tier Subnet (NSG: allow only from App Gateway)
├── Data Tier Subnet (NSG: allow only from App Tier)
└── Management Subnet (NSG: Bastion only, no direct RDP/SSH)
Network Security Groups enforce micro-segmentation at the subnet level. No direct internet access is permitted to any backend resource.
Identity Layer
A Zero Trust identity model was implemented using Azure Active Directory:
- Conditional Access policies: MFA required for all users, compliant devices required for privileged roles
- Privileged Identity Management (PIM): just-in-time elevation for admin roles with approval workflows
- Managed Identities: all service-to-service authentication via managed identity — no stored credentials
- RBAC model with four roles: Developer, DevOps Engineer, Platform Admin, Support Engineer
Data Layer
Azure Key Vault was introduced as the central secrets store:
- All connection strings, API keys, and certificates migrated from application configs into Key Vault
- Applications reference secrets via Managed Identity — no secrets in code or environment variables
- Key Vault access policies scoped to minimum required permissions per application
- Soft delete and purge protection enabled on all vaults
Threat Detection Layer
Microsoft Defender for Cloud and Azure Sentinel were integrated as the security operations layer:
| Service | Function |
|---|---|
| Defender for Cloud | Continuous security posture assessment, vulnerability scanning, regulatory compliance dashboard |
| Azure Sentinel | SIEM — log aggregation, threat detection rules, incident management |
| Log Analytics | Centralised workspace for all diagnostic logs, security events, and audit trails |
Sentinel detection rules were configured for:
- Impossible travel / anomalous sign-in behaviour
- Mass resource deletion or privilege escalation
- Outbound traffic to known malicious IPs
- Key Vault access anomalies
Private Connectivity
All PaaS services were integrated into the VNet via private endpoints:
- Azure SQL accessible only from the App Tier subnet
- App Services integrated into VNet with outbound routing through the hub firewall
- Storage accounts restricted to private endpoint access only
- Public network access disabled on all data-tier resources
Migration Execution
The cross-subscription migration was the highest-risk phase of the engagement. It was executed using a structured approach to eliminate disruption.
Dependency Mapping
Before any migration activity, a full dependency map was produced:
- Application → database connections documented
- API endpoint dependencies identified
- Certificate and DNS dependencies catalogued
- External integrations (webhooks, third-party APIs) inventoried
This mapping drove the migration sequencing — services with no downstream dependencies were migrated first.
Migration Approach by Service Type
App Services
Each App Service was recreated in the target subscription:
{
"type": "Microsoft.Web/sites",
"name": "[parameters('appServiceName')]",
"properties": {
"serverFarmId": "[resourceId('Microsoft.Web/serverfarms', parameters('planName'))]",
"siteConfig": {
"vnetRouteAllEnabled": true,
"appSettings": [
{ "name": "KEY_VAULT_URI", "value": "[parameters('keyVaultUri')]" }
]
}
}
}
Application settings referencing secrets were updated to point to Key Vault references rather than inline values during the migration — eliminating the credential exposure that existed in the source environment.
SQL Databases
Databases were migrated via backup/restore with validation at each stage:
- Full backup taken in source subscription
- Backup validated and restored to target in read-only mode
- Application connectivity tested against restored copy
- Final differential backup and restore during maintenance window
- DNS/connection string cutover with rollback window held open for 24 hours
Key Vault Secrets
Secrets were exported from the source environment (where they existed in application configs) and imported into the new Key Vault vaults with access policies pre-configured before any application cutover.
Network Resources
VNets were recreated in the target subscriptions with updated IP ranges to avoid conflicts:
- Subnets defined and NSGs pre-applied before any workload migration
- Private DNS zones created and linked to the new VNets
- Peering to the hub VNet established and validated before traffic was routed
Cutover Strategy
Each service cutover followed a four-step pattern:
- Pre-cutover validation — application smoke-tested in new subscription with production-like data
- DNS pre-staging — TTL reduced to 60 seconds 48 hours before cutover window
- Cutover execution — DNS records updated, old endpoints deprecated
- Monitoring hold — 30-minute observation window with rollback capability before old resources decommissioned
This sequencing ensured that at no point during the migration was a service unavailable to end users.
High Availability and Reliability
Multi-Node Application Cluster
The Appian application tier was deployed as a 3-node cluster:
| Node | Role | Availability Zone |
|---|---|---|
| Primary | Core processing, workflow engine | Zone 1 |
| Secondary A | Read scaling, synchronisation | Zone 2 |
| Secondary B | Failover, scheduled jobs | Zone 3 |
Azure Load Balancer distributes traffic across nodes with health probe monitoring. Failover is automatic — if the primary node becomes unavailable, traffic is rerouted within seconds.
Automation and Self-Healing
Operational automation replaced all manual service management:
- Scheduled VM start/stop via Azure Automation runbooks
- Service health checks via Azure Monitor with automated alerting
- Autoscale rules configured for the App Service Plan based on CPU and memory thresholds
- Automated backup policies on all databases with geo-redundant storage
Disaster Recovery Posture
The architecture was designed for recovery from both individual component failures and broader regional events:
- Azure SQL geo-replication to a paired region
- Recovery Time Objective (RTO): 1 hour
- Recovery Point Objective (RPO): 15 minutes
- Traffic failover via Azure Traffic Manager if primary region becomes unavailable
Governance Framework
Governance was embedded into the platform structure rather than applied as a post-deployment checklist.
Resource Governance
A mandatory tagging policy was enforced via Azure Policy at the management group level:
| Tag | Required | Purpose |
|---|---|---|
environment | Yes | dev / qa / prod |
owner | Yes | Team or individual accountable |
cost-centre | Yes | Finance allocation |
application | Yes | Workload identifier |
Resources deployed without required tags are flagged in Defender for Cloud and blocked in production by a deny policy effect.
Access Governance
- PIM configured for all roles above Contributor — elevation requires approval and expires after 8 hours
- Access reviews scheduled quarterly via Azure AD Identity Governance
- Break-glass emergency accounts are monitored by Sentinel with alerts on any usage
- Service principal credentials were fully replaced by Managed Identities — no password-based service accounts remain
Policy Enforcement
Azure Policy initiatives enforce baseline standards across all workload subscriptions:
| Policy | Effect | Scope |
|---|---|---|
| No public IP on VMs | Deny | All workload subscriptions |
| Storage HTTPS only | Deny | All subscriptions |
| Diagnostic settings required | DeployIfNotExists | All resource types |
| Key Vault soft delete | Audit + Deny | All subscriptions |
| Defender for Cloud Standard | DeployIfNotExists | All subscriptions |
Results
| Dimension | Before | After |
|---|---|---|
| Service disruptions during migration | — | Zero |
| Direct internet exposure | All backend services | Eliminated — all traffic via App Gateway + WAF |
| Secrets management | Inline in application configs | Centralised in Key Vault via Managed Identity |
| Threat detection | None | Defender for Cloud + Sentinel with active alert rules |
| Subscription isolation | Single shared subscription | 5 dedicated subscriptions with policy enforcement |
| Manual operations | ~100% | Reduced by over 70% |
| Compliance visibility | No framework | Defender for Cloud regulatory compliance dashboard |
| Access governance | Unmanaged, accumulated permissions | RBAC + PIM + quarterly access reviews |
The migration completed without a single service disruption. That's the result that kept people up at night beforehand, and it's what validated the dependency mapping and the sequenced cutover approach.
The security posture is now proactive rather than reactive. Defender for Cloud tracks compliance continuously. Sentinel surfaces threats in real time. Key Vault eliminated the credential sprawl that had been accumulating quietly for years. The environment that came out the other side doesn't resemble what went in.
