Background
Cloud adoption had outpaced any attempt to govern it. Teams provisioned what they needed, resources accumulated with no attribution, and by the time finance flagged the monthly bill, nobody could answer the basic question: where is the money going?
Two years of decentralized Azure expansion had produced a sprawling environment across multiple subscriptions — inconsistent naming, no tagging, no centralized view of what was running or why. The executive team was paying the bill without understanding it.
Anil Choudhary led the cost optimization initiative from audit through governance implementation and ongoing review framework setup.
The Challenge
No Cost Visibility
The Azure bill arrived as a monolithic number. Decomposing it into meaningful cost centres was not possible because:
- No tagging existed on the majority of resources — 70%+ of resources had no owner, environment, or application tag
- Multiple subscriptions with no consistent naming made identifying which teams owned which resources an exercise in guesswork
- Azure Cost Management showed cost by service type (compute, storage, networking) but could not attribute cost by team, application, or business unit
- Engineering teams had no visibility into the cost of their own workloads — there was no feedback loop connecting technical decisions to financial impact
Uncontrolled Provisioning
Resources were being created without any approval or review process:
- Development subscriptions had Contributor access for large groups of engineers
- No policy prevented provisioning of expensive resource types (high-SKU VMs, premium storage, large databases) without justification
- Resources created for one-time tasks or experiments were never decommissioned
- No auto-shutdown policies existed on development or test VMs
Idle and Oversized Resources
A detailed resource utilization audit revealed significant waste:
- VMs running 24/7 with CPU consistently below 5% — clearly oversized for their workload
- Development and test VMs running overnight and on weekends when no development was in progress
- Databases provisioned at high service tiers with minimal actual usage
- Storage accounts with data never accessed in months — retained "just in case"
- Orphaned resources: managed disks with no attached VM, public IPs with no associated resource, network interfaces unattached
No Reserved Instance Coverage
The organization was paying on-demand prices for workloads that had been running continuously for over a year. Reserved instances (1-year or 3-year commitments) on stable workloads offered significant discounts — but no one had assessed what was eligible.
Implementation
Phase 1: Tagging Policy and Remediation
A mandatory tagging policy was the foundation everything else depended on. Without tags, cost attribution was impossible.
Required Tags (enforced via Azure Policy)
| Tag | Values | Purpose |
|---|---|---|
environment | dev / test / staging / prod | Environment identification |
owner | Team or individual name | Accountability |
cost-centre | Finance cost centre code | Financial allocation |
application | Application or service name | Workload identification |
expiry-date | ISO date or permanent | Identifies temporary resources |
Policy Implementation
{
"mode": "Indexed",
"policyRule": {
"if": {
"allOf": [
{ "field": "type", "notIn": ["Microsoft.Resources/subscriptions", "Microsoft.Resources/resourceGroups"] },
{
"anyOf": [
{ "field": "tags['environment']", "exists": false },
{ "field": "tags['owner']", "exists": false },
{ "field": "tags['cost-centre']", "exists": false },
{ "field": "tags['application']", "exists": false }
]
}
]
},
"then": {
"effect": "deny"
}
}
}
The policy was applied in audit mode first for 2 weeks to measure the scope of non-compliance, then switched to deny for all new resources. Existing untagged resources were remediated via a bulk tagging exercise using Azure Resource Graph queries to identify them and Azure CLI scripts to apply tags at scale.
# Bulk tag remediation — apply owner tag to all untagged resources in a subscription
az resource list --subscription $SUB_ID --query "[?tags.owner==null].id" -o tsv | \
xargs -I{} az resource tag --ids {} --tags owner="$TEAM_NAME" cost-centre="$CC_CODE"
Phase 2: Cost Visibility Dashboard
With tagging in place, a Cost Management dashboard was built to give every stakeholder a view of their costs:
Executive Dashboard
- Total monthly spend vs. budget with trend line
- Top 5 cost centres by spend
- Month-over-month change by team
- Forecast vs. actual for current month
Team Dashboards
Each team received their own dashboard scope filtered to their cost-centre tag:
- Their current month's spend by application
- Top 5 most expensive resources
- Resources with
expiry-datein the past (overdue for decommission) - Recommendations from Azure Advisor for their resources
Budget Alerts Budgets were set per cost centre at 80% and 100% thresholds, with email and Teams notification to the team owner and their manager.
Phase 3: Idle and Oversized Resource Remediation
The utilization audit identified resources in three categories:
Category 1: Terminate immediately
- Orphaned managed disks (no attached VM) — 47 found, costing ~$1,200/month
- Unattached public IP addresses — 23 found
- Stopped VMs still incurring storage and IP costs — 31 found
- Expired temporary resources — 18 resources with past expiry dates
These were confirmed with resource owners and deleted within the first two weeks.
Category 2: Rightsize VMs with consistently low CPU and memory utilization over 30 days were identified via Azure Monitor metrics:
Perf
| where TimeGenerated > ago(30d)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue), MaxCPU = max(CounterValue) by Computer
| where AvgCPU < 10 and MaxCPU < 30
| join kind=inner (
Heartbeat | summarize by Computer, ResourceGroup, SubscriptionId
) on Computer
34 VMs were identified for rightsizing. A comparison of current SKU to recommended SKU (based on Azure Advisor recommendations and actual utilization) showed an average cost reduction of 40% per machine after resizing.
Category 3: Implement auto-shutdown Development and test VMs running outside of business hours were placed under an auto-shutdown schedule:
# Apply auto-shutdown to all VMs tagged environment=dev
az vm list --query "[?tags.environment=='dev'].id" -o tsv | \
while read vm_id; do
az vm auto-shutdown --ids "$vm_id" --time 1900 --timezone "UTC"
done
Development VMs shutting down at 19:00 UTC (7pm) and restarting manually when needed reduced compute spend on the development fleet by ~62%.
Phase 4: Reserved Instance Strategy
An analysis of the production fleet identified workloads with stable, predictable resource requirements — candidates for reserved instance commitments:
| Workload Type | Current Cost (on-demand) | Reserved 1-yr | Reserved 3-yr | Recommendation |
|---|---|---|---|---|
| Production web tier VMs | $8,400/mo | $5,460/mo (35% saving) | $4,200/mo (50% saving) | 1-year RI |
| Database servers | $6,200/mo | $3,720/mo (40% saving) | $2,790/mo (55% saving) | 3-year RI |
| App Service Plans | $3,100/mo | $2,170/mo (30% saving) | $1,860/mo (40% saving) | 1-year RI |
Reserved instances were purchased in phases — starting with the workloads with the highest confidence in stability. The total annual commitment reduced compute costs on covered workloads by an average of 38%.
Phase 5: FinOps Governance Framework
The final phase established ongoing governance to prevent cost sprawl from recurring:
Monthly Cost Review A structured monthly review was established with team leads and the finance team:
- Actual vs. budget per cost centre
- Resources flagged for decommission
- New resource spending reviewed against justification
- Reserved instance utilization reviewed (unused RI capacity is wasted)
Cost Review Board For any resource with a monthly cost projection above a defined threshold, pre-provisioning approval was required from the Cost Review Board — a lightweight process (async Teams message with cost estimate) rather than a formal committee meeting.
Savings Tracking A running savings log was maintained to track the cumulative impact of optimization activities, providing visibility for executive reporting.
Results
| Category | Monthly Saving |
|---|---|
| Terminated idle/orphaned resources | ~$4,200 |
| VM rightsizing | ~$6,800 |
| Dev/test auto-shutdown | ~$5,100 |
| Reserved instance commitments | ~$8,300 |
| Database tier optimization | ~$3,600 |
| Total monthly reduction | ~$28,000 (35%) |
| Governance Metric | Before | After |
|---|---|---|
| Resources with required tags | Below 30% | 100% (enforced) |
| Cost attribution by team | Impossible | Real-time dashboard per cost centre |
| Budget alerts | None | 80% and 100% thresholds per team |
| Reserved instance coverage | 0% | 68% of eligible production workloads |
| Dev VM overnight running | ~100% | Near zero — auto-shutdown enforced |
| Orphaned resource accumulation | Ongoing | Monthly review with expiry-date enforcement |
The 35% reduction in monthly spend happened within 10 weeks. But the more durable outcome is the governance framework. Cost visibility, tagging enforcement, and monthly reviews mean the savings compound rather than quietly erode. Without the governance layer, the same patterns would have re-emerged within a year.
