Web Application Performance & Infrastructure Optimization

Background

The applications had been built feature-first for years. Every sprint added something. Performance was never the priority — until users started noticing.

Page loads averaging 6–8 seconds. Server utilization running high even under normal load, meaning any traffic spike caused degradation. Abandonment rates climbing in direct correlation with the load times.

The causes weren't mysterious: no CDN, no caching anywhere in the stack, unoptimized database queries on hot code paths, and frontend bundles loading everything regardless of which page the user was actually on. Each problem was independently significant. Together, they'd been compounding for a while.

Anil Choudhary led the performance audit and optimization program across both infrastructure and application layers.

Performance Audit

Before implementing any changes, a structured performance audit was conducted to identify the highest-impact bottlenecks. Optimization effort was sequenced by impact-to-effort ratio — the changes delivering the greatest improvement with the least risk came first.

Baseline Measurements

Metric	Baseline
Average page load time (desktop)	6.8 seconds
Average page load time (mobile)	9.2 seconds
Time to First Byte (TTFB)	1.8 seconds
Largest Contentful Paint (LCP)	5.4 seconds
Server CPU at normal load	72% average
Server memory at normal load	68% average
Database query time (p95)	820ms
CDN / caching in use	None

Audit Findings

No CDN — All requests hit origin servers Static assets (images, CSS, JavaScript, fonts) were served directly from the application servers. A user in Los Angeles accessing an application hosted in East US was receiving all static content from 2,500+ miles away.

No caching — Every request computed fresh API responses that returned largely static data (navigation menus, product catalogues, configuration data) were recomputed from database queries on every request. The same queries ran thousands of times per minute returning identical results.

5 database queries accounting for 60% of total DB time A query analysis of the production database identified 5 queries being called at very high frequency with full table scans — no indexes on the columns being filtered, no query result caching.

Frontend bloat The main application page was loading 4.2MB of JavaScript on initial load, including several libraries used only on specific sub-pages. Every user paid the cost of loading all JavaScript regardless of which page they actually visited.

Fixed infrastructure with no scaling The App Service Plan was configured with a fixed instance count. Under normal load it ran at 70%+ CPU — any traffic spike pushed it into degraded territory. There was no scale-out trigger.

Implementation

Layer 1: Azure Front Door — CDN and Global Load Balancing

Azure Front Door was deployed as the global entry point for all web traffic:

Static Asset Caching

All static assets were cached at Front Door's edge PoPs (Points of Presence) globally:

{
  "routingRules": [
    {
      "name": "StaticAssets",
      "matchConditions": {
        "paths": ["/static/*", "/images/*", "/fonts/*", "*.css", "*.js"]
      },
      "routeConfiguration": {
        "cacheConfiguration": {
          "queryStringCachingBehavior": "IgnoreQueryString",
          "cacheDuration": "7.00:00:00"
        }
      }
    },
    {
      "name": "DynamicContent",
      "matchConditions": {
        "paths": ["/api/*", "/app/*"]
      },
      "routeConfiguration": {
        "cacheConfiguration": null,
        "forwardingProtocol": "HttpsOnly",
        "originResponseTimeoutSeconds": 30
      }
    }
  ]
}

Static assets with a 7-day cache TTL were served from the nearest PoP — a user in London received assets from the London PoP rather than the East US origin. Static content requests stopped reaching the origin servers entirely.

Global Load Balancing

Front Door's health probes and routing rules distributed dynamic requests across the origin pool, with automatic failover if an origin became unhealthy.

Layer 2: Redis Caching

Azure Cache for Redis was deployed as an application-level cache for two categories of data:

API Response Caching

Frequently requested, slowly changing API responses were cached with appropriate TTLs:

import redis
import json
from functools import wraps
from datetime import timedelta

redis_client = redis.Redis(
    host=REDIS_HOST,
    port=6380,
    password=REDIS_PASSWORD,
    ssl=True
)

def cache_response(ttl_seconds: int, key_prefix: str):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"
            
            cached = redis_client.get(cache_key)
            if cached:
                return json.loads(cached)
            
            result = func(*args, **kwargs)
            redis_client.setex(cache_key, timedelta(seconds=ttl_seconds), json.dumps(result))
            return result
        return wrapper
    return decorator

@cache_response(ttl_seconds=300, key_prefix="nav_menu")
def get_navigation_menu(user_role: str) -> dict:
    # Database query now cached for 5 minutes per role
    return db.query("SELECT * FROM navigation WHERE role = %s", user_role)

@cache_response(ttl_seconds=3600, key_prefix="product_catalogue")
def get_product_catalogue(category_id: int) -> list:
    # Product data cached for 1 hour — invalidated on update
    return db.query("SELECT * FROM products WHERE category_id = %s", category_id)

Session Storage

User session data was moved from the database to Redis, reducing database load for every authenticated page request:

# Session middleware configuration
SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
SESSION_CACHE_ALIAS = 'redis_sessions'

CACHES = {
    'redis_sessions': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': f'rediss://{REDIS_HOST}:6380/1',
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
            'PASSWORD': REDIS_PASSWORD,
            'SSL': True,
        },
        'TIMEOUT': 1800  # 30-minute session timeout
    }
}

This eliminated a database round-trip on every authenticated request — previously the heaviest source of database load from web sessions.

Layer 3: Database Query Optimization

The 5 identified high-frequency queries were analyzed and optimized:

Query 1: User permissions lookup (called on every authenticated API request)

Before:

-- Full table scan — 180ms average
SELECT p.permission_name 
FROM user_roles ur
JOIN role_permissions rp ON ur.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE ur.user_id = @userId

After — composite index added:

-- Index: CREATE INDEX idx_user_roles_user_id ON user_roles(user_id) INCLUDE (role_id)
-- Index: CREATE INDEX idx_role_permissions ON role_permissions(role_id, permission_id)
-- Result: 4ms average (97.8% reduction)

The result was also cached in Redis with a 15-minute TTL (invalidated on role change), so the database query ran only on cache miss.

Query 2: Activity feed query (called on dashboard load)

Before:

-- Scanning 3 years of activity data — 620ms
SELECT * FROM activity_log WHERE user_id = @userId ORDER BY created_at DESC LIMIT 20

After — partial index on recent data:

-- Index: CREATE INDEX idx_activity_recent ON activity_log(user_id, created_at DESC) 
--        WHERE created_at > '2024-01-01'
-- Result: 12ms average

Queries 3–5 followed similar patterns — missing indexes on frequently filtered columns, resolved with targeted index additions and query rewrites to avoid implicit type conversions causing index misses.

Total database query time at p95 improved from 820ms to 95ms after these 5 optimizations. The database CPU dropped from 65% to 28% at equivalent load.

Layer 4: Frontend Optimization

Code Splitting

JavaScript bundles were restructured to deliver only what each page needed:

// Before: single bundle — all JS loaded on every page
import { FullCalendar } from './calendar'         // 280KB — only needed on calendar page
import { ChartLibrary } from './charts'           // 190KB — only needed on analytics page
import { PDFGenerator } from './pdf'              // 145KB — only needed on reports page

// After: dynamic imports — loaded only when page is visited
const FullCalendar = lazy(() => import('./calendar'))
const ChartLibrary = lazy(() => import('./charts'))
const PDFGenerator = lazy(() => import('./pdf'))

Initial JavaScript bundle size dropped from 4.2MB to 680KB — an 84% reduction in initial load payload.

Image Optimization

Images were migrated to WebP format with responsive sizing:

<!-- Before: single large JPEG served to all devices -->
<img src="/images/hero.jpg" />

<!-- After: WebP with responsive sizes, lazy loading -->
<picture>
  <source srcset="/images/hero-800.webp 800w, /images/hero-1200.webp 1200w" type="image/webp" />
  <img src="/images/hero-800.jpg" loading="lazy" width="800" height="450" alt="..." />
</picture>

Removed Unused Dependencies

A dependency audit identified 12 npm packages imported globally but used only in specific contexts, or replaced by browser-native APIs. Removing them reduced the bundle further and eliminated 3 packages with known minor CVEs.

Layer 5: Infrastructure Autoscaling

The App Service Plan was configured with autoscale rules to handle traffic spikes without manual intervention:

{
  "profiles": [{
    "name": "Auto created scale condition",
    "capacity": { "minimum": "2", "maximum": "10", "default": "2" },
    "rules": [
      {
        "metricTrigger": {
          "metricName": "CpuPercentage",
          "operator": "GreaterThan",
          "threshold": 70,
          "timeAggregation": "Average",
          "timeWindow": "PT5M"
        },
        "scaleAction": {
          "direction": "Increase",
          "type": "ChangeCount",
          "value": "1",
          "cooldown": "PT5M"
        }
      },
      {
        "metricTrigger": {
          "metricName": "CpuPercentage",
          "operator": "LessThan",
          "threshold": 30,
          "timeAggregation": "Average",
          "timeWindow": "PT10M"
        },
        "scaleAction": {
          "direction": "Decrease",
          "type": "ChangeCount",
          "value": "1",
          "cooldown": "PT10M"
        }
      }
    ]
  }]
}

The minimum instance count of 2 provided baseline redundancy; scale-out triggered at 70% CPU, keeping headroom for traffic spikes before degradation occurred.

Results

Metric	Before	After	Improvement
Average page load time (desktop)	6.8s	2.1s	69%
Average page load time (mobile)	9.2s	3.0s	67%
Time to First Byte (TTFB)	1.8s	0.3s	83%
Largest Contentful Paint (LCP)	5.4s	1.8s	67%
Server CPU at normal load	72%	28%	61% reduction
Database query time (p95)	820ms	95ms	88%
JavaScript initial bundle	4.2MB	680KB	84%
CDN cache hit ratio	0%	87% for static assets	—
Infrastructure cost	Fixed 4 instances	2–4 via autoscale	~30% cost saving

The 60%+ improvement came primarily from two things: CDN deployment serving static assets from the edge, and Redis caching eliminating redundant database queries. The database optimizations and frontend bundle changes compounded the improvement further and reduced infrastructure load enough that the autoscale floor could drop without affecting users. The pages that used to take 6–8 seconds now load in under 2.5.

Background

The applications had been built feature-first for years. Every sprint added something. Performance was never the priority — until users started noticing.

Anil Choudhary led the performance audit and optimization program across both infrastructure and application layers.

Performance Audit

Baseline Measurements

Metric	Baseline
Average page load time (desktop)	6.8 seconds
Average page load time (mobile)	9.2 seconds
Time to First Byte (TTFB)	1.8 seconds
Largest Contentful Paint (LCP)	5.4 seconds
Server CPU at normal load	72% average
Server memory at normal load	68% average
Database query time (p95)	820ms
CDN / caching in use	None

Audit Findings

Implementation

Layer 1: Azure Front Door — CDN and Global Load Balancing

Azure Front Door was deployed as the global entry point for all web traffic:

Static Asset Caching

All static assets were cached at Front Door's edge PoPs (Points of Presence) globally:

{
  "routingRules": [
    {
      "name": "StaticAssets",
      "matchConditions": {
        "paths": ["/static/*", "/images/*", "/fonts/*", "*.css", "*.js"]
      },
      "routeConfiguration": {
        "cacheConfiguration": {
          "queryStringCachingBehavior": "IgnoreQueryString",
          "cacheDuration": "7.00:00:00"
        }
      }
    },
    {
      "name": "DynamicContent",
      "matchConditions": {
        "paths": ["/api/*", "/app/*"]
      },
      "routeConfiguration": {
        "cacheConfiguration": null,
        "forwardingProtocol": "HttpsOnly",
        "originResponseTimeoutSeconds": 30
      }
    }
  ]
}

Global Load Balancing

Front Door's health probes and routing rules distributed dynamic requests across the origin pool, with automatic failover if an origin became unhealthy.

Layer 2: Redis Caching

Azure Cache for Redis was deployed as an application-level cache for two categories of data:

API Response Caching

Frequently requested, slowly changing API responses were cached with appropriate TTLs:

import redis
import json
from functools import wraps
from datetime import timedelta

redis_client = redis.Redis(
    host=REDIS_HOST,
    port=6380,
    password=REDIS_PASSWORD,
    ssl=True
)

def cache_response(ttl_seconds: int, key_prefix: str):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"
            
            cached = redis_client.get(cache_key)
            if cached:
                return json.loads(cached)
            
            result = func(*args, **kwargs)
            redis_client.setex(cache_key, timedelta(seconds=ttl_seconds), json.dumps(result))
            return result
        return wrapper
    return decorator

@cache_response(ttl_seconds=300, key_prefix="nav_menu")
def get_navigation_menu(user_role: str) -> dict:
    # Database query now cached for 5 minutes per role
    return db.query("SELECT * FROM navigation WHERE role = %s", user_role)

@cache_response(ttl_seconds=3600, key_prefix="product_catalogue")
def get_product_catalogue(category_id: int) -> list:
    # Product data cached for 1 hour — invalidated on update
    return db.query("SELECT * FROM products WHERE category_id = %s", category_id)

Session Storage

User session data was moved from the database to Redis, reducing database load for every authenticated page request:

# Session middleware configuration
SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
SESSION_CACHE_ALIAS = 'redis_sessions'

CACHES = {
    'redis_sessions': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': f'rediss://{REDIS_HOST}:6380/1',
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
            'PASSWORD': REDIS_PASSWORD,
            'SSL': True,
        },
        'TIMEOUT': 1800  # 30-minute session timeout
    }
}

This eliminated a database round-trip on every authenticated request — previously the heaviest source of database load from web sessions.

Layer 3: Database Query Optimization

The 5 identified high-frequency queries were analyzed and optimized:

Query 1: User permissions lookup (called on every authenticated API request)

Before:

-- Full table scan — 180ms average
SELECT p.permission_name 
FROM user_roles ur
JOIN role_permissions rp ON ur.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE ur.user_id = @userId

After — composite index added:

-- Index: CREATE INDEX idx_user_roles_user_id ON user_roles(user_id) INCLUDE (role_id)
-- Index: CREATE INDEX idx_role_permissions ON role_permissions(role_id, permission_id)
-- Result: 4ms average (97.8% reduction)

The result was also cached in Redis with a 15-minute TTL (invalidated on role change), so the database query ran only on cache miss.

Query 2: Activity feed query (called on dashboard load)

Before:

-- Scanning 3 years of activity data — 620ms
SELECT * FROM activity_log WHERE user_id = @userId ORDER BY created_at DESC LIMIT 20

After — partial index on recent data:

-- Index: CREATE INDEX idx_activity_recent ON activity_log(user_id, created_at DESC) 
--        WHERE created_at > '2024-01-01'
-- Result: 12ms average

Total database query time at p95 improved from 820ms to 95ms after these 5 optimizations. The database CPU dropped from 65% to 28% at equivalent load.

Layer 4: Frontend Optimization

Code Splitting

JavaScript bundles were restructured to deliver only what each page needed:

// Before: single bundle — all JS loaded on every page
import { FullCalendar } from './calendar'         // 280KB — only needed on calendar page
import { ChartLibrary } from './charts'           // 190KB — only needed on analytics page
import { PDFGenerator } from './pdf'              // 145KB — only needed on reports page

// After: dynamic imports — loaded only when page is visited
const FullCalendar = lazy(() => import('./calendar'))
const ChartLibrary = lazy(() => import('./charts'))
const PDFGenerator = lazy(() => import('./pdf'))

Initial JavaScript bundle size dropped from 4.2MB to 680KB — an 84% reduction in initial load payload.

Image Optimization

Images were migrated to WebP format with responsive sizing:

<!-- Before: single large JPEG served to all devices -->
<img src="/images/hero.jpg" />

<!-- After: WebP with responsive sizes, lazy loading -->
<picture>
  <source srcset="/images/hero-800.webp 800w, /images/hero-1200.webp 1200w" type="image/webp" />
  <img src="/images/hero-800.jpg" loading="lazy" width="800" height="450" alt="..." />
</picture>

Removed Unused Dependencies

Layer 5: Infrastructure Autoscaling

The App Service Plan was configured with autoscale rules to handle traffic spikes without manual intervention:

{
  "profiles": [{
    "name": "Auto created scale condition",
    "capacity": { "minimum": "2", "maximum": "10", "default": "2" },
    "rules": [
      {
        "metricTrigger": {
          "metricName": "CpuPercentage",
          "operator": "GreaterThan",
          "threshold": 70,
          "timeAggregation": "Average",
          "timeWindow": "PT5M"
        },
        "scaleAction": {
          "direction": "Increase",
          "type": "ChangeCount",
          "value": "1",
          "cooldown": "PT5M"
        }
      },
      {
        "metricTrigger": {
          "metricName": "CpuPercentage",
          "operator": "LessThan",
          "threshold": 30,
          "timeAggregation": "Average",
          "timeWindow": "PT10M"
        },
        "scaleAction": {
          "direction": "Decrease",
          "type": "ChangeCount",
          "value": "1",
          "cooldown": "PT10M"
        }
      }
    ]
  }]
}

The minimum instance count of 2 provided baseline redundancy; scale-out triggered at 70% CPU, keeping headroom for traffic spikes before degradation occurred.

Results

Metric	Before	After	Improvement
Average page load time (desktop)	6.8s	2.1s	69%
Average page load time (mobile)	9.2s	3.0s	67%
Time to First Byte (TTFB)	1.8s	0.3s	83%
Largest Contentful Paint (LCP)	5.4s	1.8s	67%
Server CPU at normal load	72%	28%	61% reduction
Database query time (p95)	820ms	95ms	88%
JavaScript initial bundle	4.2MB	680KB	84%
CDN cache hit ratio	0%	87% for static assets	—
Infrastructure cost	Fixed 4 instances	2–4 via autoscale	~30% cost saving