Zero Trust Identity Governance & Infrastructure Automation | Case Study

Background

Years of cloud expansion had left the privileged access model in a difficult state. Admin accounts with permanent, always-on permissions had spread across Azure subscriptions, on-premises Active Directory, and a secondary cloud environment. Active 24 hours a day, whether or not anyone was actually doing admin work. Each one a credential waiting to be exploited.

The server estate was in a similar position. Three hosting environments, no consistent configuration baseline, security agents deployed manually at different times. Some servers were covered. Some had outdated agents. A subset had nothing at all. Nobody had a reliable picture of which was which.

Anil Choudhary led the design, implementation, and handover end-to-end — covering Azure PIM, Ansible automation, and the Zero Trust model that tied it together.

The Challenge

Privileged Access as a Persistent Attack Surface

In financial services, privileged accounts are the highest-value targets for attackers. The existing access model had several critical weaknesses:

Always-on Global Administrators — multiple accounts with permanent subscription-level admin rights, active whether or not any administrative task was underway
No time-bounding — once a role was assigned, it remained assigned indefinitely with no expiry
No approval workflow — developers and engineers could self-assign elevated permissions in some areas without oversight
No MFA on role activation — privileged operations could be performed without any additional authentication challenge
No audit trail — there was no centralized log of who had elevated access, when they used it, or what actions they took while elevated

A single compromised credential with persistent Global Administrator rights would give an attacker unrestricted access to the entire Azure tenant — no time limit, no automatic detection. In financial services, that's an existential risk.

Hybrid Infrastructure Complexity

The server estate spanned three environments with different management planes:

Azure VMs — managed through Azure Resource Manager, policy-eligible, Arc-connected where needed
On-premises servers — Windows Server and Linux, managed through legacy tooling, no consistent agent baseline
Secondary cloud workloads — VMs in a non-Azure environment with no current integration into the organization's security monitoring

Ensuring consistent configuration — security agents, hardening baselines, patch levels — across all three required a tool and approach that could operate uniformly regardless of hosting location.

Configuration Drift and Manual Deployment

The existing deployment model was fully manual:

Security agents were installed one server at a time via RDP/SSH
No configuration baseline existed — servers were provisioned and configured differently depending on who did the work
Drift accumulation meant the security posture of any individual server was unknown without connecting to it directly
Scaling security tooling to new servers required manual intervention each time
The time between a server being provisioned and a security agent being installed on it was measured in days, not minutes

For an organization operating under financial services compliance requirements, this was an auditable gap.

No Zero Trust Enforcement

The security model in place was implicitly perimeter-based: once inside the network, entities were broadly trusted. There was no verification layer applied to internal traffic, privileged operations, or lateral movement within the environment.

Architecture

The solution was built on three interdependent pillars: identity governance through Azure PIM, automated workload protection through Ansible-driven Defender deployment, and continuous verification through integrated monitoring.

Zero Trust Control Mapping

Zero Trust Principle	Implementation
Never trust, always verify	Azure PIM: every privileged action requires activation, MFA, and approval
Use least privilege access	Eligible assignments only — no permanent admin roles
Assume breach	Defender for Servers: continuous vulnerability scanning and threat detection
Verify explicitly	Conditional Access: device compliance + MFA required for all admin sessions
Automate response	Ansible: consistent enforcement across the full estate; Defender playbooks for remediation

Azure Privileged Identity Management

Design Philosophy

The PIM implementation was built around a single principle: no account should hold elevated permissions unless actively performing a task that requires them. The window between activation and expiry should be as short as the task permits.

Role Assignment Model

Two assignment types were configured, with clear policies on when each applies:

Eligible Assignment (default for all admin roles)
  └── User must explicitly activate the role
  └── Activation requires: MFA challenge + justification text + optional approval
  └── Role is active for a configured duration (max 8 hours)
  └── Role expires automatically — no manual deactivation needed

Active Assignment (exceptions only, requires documented business justification)
  └── Break-glass emergency accounts only
  └── Monitored by Azure Sentinel with immediate alert on any usage
  └── Reviewed quarterly — active assignments not re-justified are reverted to eligible

Permanent Active assignments for operational roles were removed entirely during the implementation. Every operational admin account — including Global Administrator — was converted to Eligible.

Role Configuration by Persona

Role	Assignment Type	Activation Duration	Approval Required	MFA Required
Global Administrator	Eligible	4 hours	Yes — two approvers	Yes
Privileged Role Administrator	Eligible	4 hours	Yes — one approver	Yes
Subscription Owner	Eligible	8 hours	Yes — one approver	Yes
Contributor	Eligible	8 hours	No	Yes
Security Reader	Eligible	8 hours	No	No
Break-glass account	Active (permanent)	N/A	N/A	N/A — monitored

Activation Workflow

When an engineer needs elevated access, the process is:

Navigate to PIM in the Azure portal or trigger activation via PowerShell/CLI
Select the role and specify the duration required (up to the configured maximum)
Provide a justification describing the task (stored in the audit log)
Complete an MFA challenge
If approval is required: wait for an approver to review and approve (approvers are notified via email and Teams)
Role becomes active — the engineer receives notification and the activation is logged
Role expires automatically at the end of the configured duration

# Activating a role via PowerShell (for scripted workflows)
$schedule = New-Object Microsoft.Open.MSGraph.Model.AzureADMSPrivilegedSchedule
$schedule.Type = "Once"
$schedule.Duration = "PT4H"  # 4 hours
$schedule.StartDateTime = (Get-Date).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ")

Open-AzureADMSPrivilegedRoleAssignmentRequest `
  -ProviderId "aadRoles" `
  -ResourceId "<tenant-id>" `
  -RoleDefinitionId "<role-definition-id>" `
  -SubjectId "<user-object-id>" `
  -AssignmentState "Active" `
  -Type "UserAdd" `
  -Reason "Deploying security policy update - incident INC-20231105" `
  -Schedule $schedule

Access Reviews

Quarterly access reviews were configured for all PIM-eligible assignments:

Review owners: team leads for each function (not the security team — ownership sits with the business)
Scope: all eligible and active role assignments
Outcome: assignments not actively re-justified within the review window are automatically removed
Results reported to the compliance team for audit evidence

Audit and Monitoring

All PIM activations, approvals, and denials are logged to Azure AD audit logs and forwarded to the Log Analytics workspace:

// PIM role activations in the last 24 hours
AuditLogs
| where TimeGenerated > ago(24h)
| where OperationName has "Add eligible member to role in PIM completed"
    or OperationName has "Add member to role in PIM completed (permanent)"
| extend
    RoleName = tostring(TargetResources[0].displayName),
    InitiatedBy = tostring(InitiatedBy.user.userPrincipalName),
    Justification = tostring(AdditionalDetails[0].value)
| project TimeGenerated, InitiatedBy, RoleName, Justification, Result
| order by TimeGenerated desc

A Sentinel analytics rule fires an alert on any Global Administrator or Privileged Role Administrator activation outside business hours — a pattern inconsistent with normal operations that warrants investigation.

Workload Protection: Defender for Servers

Deployment Decision

Defender for Servers Plan 2 was selected over Plan 1 for its inclusion of Microsoft Defender for Endpoint (MDE) integration, file integrity monitoring, and the 500 MB/day free Log Analytics data ingestion per covered server — significant at scale.

Hybrid Onboarding via Azure Arc

The on-premises and secondary cloud servers were onboarded into Azure management via Azure Arc before Defender deployment. Arc extends the Azure control plane to non-Azure machines, enabling policy assignment, Defender coverage, and monitoring through the same tooling used for Azure VMs.

Arc onboarding was scripted and included in the Ansible playbooks so that new servers automatically joined the Arc management plane on first provisioning:

# tasks/arc-onboarding.yml
- name: Download Azure Arc agent
  get_url:
    url: "https://aka.ms/azcmagent-linux"
    dest: /tmp/install_linux_azcmagent.sh
    mode: '0755'

- name: Install Azure Arc agent
  command: /tmp/install_linux_azcmagent.sh
  args:
    creates: /usr/bin/azcmagent

- name: Connect to Azure Arc
  command: >
    azcmagent connect
    --subscription-id "{{ azure_subscription_id }}"
    --resource-group "{{ azure_resource_group }}"
    --tenant-id "{{ azure_tenant_id }}"
    --location "{{ azure_location }}"
    --cloud AzureCloud
  environment:
    AZURE_CLIENT_ID: "{{ arc_service_principal_id }}"
    AZURE_CLIENT_SECRET: "{{ arc_service_principal_secret }}"
  register: arc_connect_result

- name: Verify Arc connection
  command: azcmagent show
  register: arc_status
  failed_when: "'Connected' not in arc_status.stdout"

Server Coverage Matrix

Environment	Onboarding Method	Defender Plan	Agent Type
Azure VMs	Native — Defender for Cloud policy	Plan 2	MMA / AMA
On-premises Windows	Azure Arc	Plan 2	AMA via Arc
On-premises Linux	Azure Arc	Plan 2	AMA via Arc
Secondary cloud VMs	Azure Arc	Plan 2	AMA via Arc

A Defender for Cloud policy was assigned at the management group level with DeployIfNotExists effect — any Azure VM not covered by Defender for Servers Plan 2 is automatically remediated by the policy engine, with no manual intervention required.

Ansible Automation Framework

Architecture

The Ansible control plane was deployed as a dedicated Linux VM inside the management subnet, accessible only via Azure Bastion. No SSH port was exposed externally.

Management Subnet
  └── Ansible Control Node (Ubuntu 22.04)
        └── SSH key-based authentication (no password auth)
        └── Inventory: dynamic (Azure Resource Graph + on-prem static)
        └── Vault: credentials stored in Ansible Vault, keys in Azure Key Vault

Client Nodes (all environments)
  └── SSH key installed at provisioning time
  └── Dedicated ansible service account (no shell login, sudo for specific commands only)
  └── Firewall rule: inbound SSH from Ansible control node IP only

Inventory Management

A hybrid inventory was built to cover all three environments:

Azure VMs — queried dynamically via the Azure Resource Manager inventory plugin:

# inventory/azure_rm.yml
plugin: azure.azcollection.azure_rm
auth_source: auto
include_vm_resource_groups:
  - rg-production-workloads
  - rg-development
keyed_groups:
  - key: tags.environment
    prefix: env
  - key: tags.os_type
    prefix: os

On-premises and Arc-connected servers — maintained in a static inventory file with group variables:

# inventory/onprem.ini
[linux_servers]
srv-prod-01.internal ansible_host=10.10.1.11
srv-prod-02.internal ansible_host=10.10.1.12
srv-dev-01.internal  ansible_host=10.10.2.11

[windows_servers]
win-prod-01.internal ansible_host=10.10.1.21 ansible_connection=winrm
win-prod-02.internal ansible_host=10.10.1.22 ansible_connection=winrm

[linux_servers:vars]
ansible_user=ansible-svc
ansible_ssh_private_key_file=/etc/ansible/keys/ansible_rsa
ansible_become=true
ansible_become_method=sudo

Core Playbooks

Security Baseline Playbook

Applied to every server on first provisioning and run on a weekly schedule via cron to detect and correct drift:

# playbooks/security-baseline.yml
---
- name: Apply security baseline
  hosts: all
  become: true
  vars_files:
    - ../vault/secrets.yml

  tasks:
    - name: Ensure auditd is installed and running
      package:
        name: auditd
        state: present
      notify: restart auditd

    - name: Disable root SSH login
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^PermitRootLogin'
        line: 'PermitRootLogin no'
        state: present
      notify: restart sshd

    - name: Disable password authentication
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^PasswordAuthentication'
        line: 'PasswordAuthentication no'
        state: present
      notify: restart sshd

    - name: Set login banner
      copy:
        content: |
          Authorized access only. All activity is monitored and logged.
        dest: /etc/issue.net

    - name: Configure SSH to use login banner
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^Banner'
        line: 'Banner /etc/issue.net'
        state: present
      notify: restart sshd

    - name: Ensure unattended-upgrades is configured
      apt:
        name: unattended-upgrades
        state: present
      when: ansible_os_family == "Debian"

    - name: Enable automatic security updates
      debconf:
        name: unattended-upgrades
        question: unattended-upgrades/enable_auto_updates
        vtype: boolean
        value: 'true'
      when: ansible_os_family == "Debian"

  handlers:
    - name: restart sshd
      service:
        name: sshd
        state: restarted

    - name: restart auditd
      service:
        name: auditd
        state: restarted

Defender for Servers Deployment Playbook

The core automation that replaced manual agent installation across the entire server estate:

# playbooks/deploy-defender.yml
---
- name: Deploy Microsoft Defender for Servers agent
  hosts: linux_servers
  become: true
  vars:
    workspace_id: "{{ lookup('azure_keyvault_secret', 'law-workspace-id') }}"
    workspace_key: "{{ lookup('azure_keyvault_secret', 'law-workspace-key') }}"

  tasks:
    - name: Check if OMS agent already installed
      stat:
        path: /opt/microsoft/omsagent/bin/omsagent
      register: oms_installed

    - name: Download OMS agent installer
      get_url:
        url: "https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh"
        dest: /tmp/onboard_agent.sh
        mode: '0755'
      when: not oms_installed.stat.exists

    - name: Install and onboard OMS agent
      command: >
        /tmp/onboard_agent.sh
        -w "{{ workspace_id }}"
        -s "{{ workspace_key }}"
        -d opinsights.azure.com
      when: not oms_installed.stat.exists
      register: oms_install_result

    - name: Verify agent is running
      service:
        name: omsagent
        state: started
        enabled: yes
      register: agent_status

    - name: Confirm agent visible in workspace
      uri:
        url: "https://management.azure.com/subscriptions/{{ azure_subscription_id }}/resourceGroups/{{ resource_group }}/providers/Microsoft.OperationalInsights/workspaces/{{ workspace_name }}/computers?api-version=2020-08-01"
        method: GET
        headers:
          Authorization: "Bearer {{ azure_access_token }}"
      register: workspace_check
      until: workspace_check.json.value | selectattr('name', 'equalto', inventory_hostname) | list | length > 0
      retries: 12
      delay: 30

    - name: Log deployment result
      debug:
        msg: "Defender agent successfully deployed and confirmed on {{ inventory_hostname }}"
      when: workspace_check is succeeded

Drift Detection and Remediation

A weekly scheduled run compared the actual state of each server against the baseline and generated a compliance report:

# playbooks/compliance-check.yml
---
- name: Security compliance check
  hosts: all
  become: true
  gather_facts: true

  tasks:
    - name: Check SSH root login disabled
      command: grep -c "^PermitRootLogin no" /etc/ssh/sshd_config
      register: root_login_check
      changed_when: false
      failed_when: false

    - name: Check Defender agent running
      service_facts:

    - name: Check auditd running
      service_facts:

    - name: Compile compliance status
      set_fact:
        compliance:
          hostname: "{{ inventory_hostname }}"
          ssh_root_disabled: "{{ root_login_check.rc == 0 }}"
          defender_running: "{{ 'omsagent' in services and services['omsagent'].state == 'running' }}"
          auditd_running: "{{ 'auditd' in services and services['auditd'].state == 'running' }}"

    - name: Output compliance report
      delegate_to: localhost
      lineinfile:
        path: /var/log/ansible/compliance-{{ ansible_date_time.date }}.csv
        line: "{{ compliance.hostname }},{{ compliance.ssh_root_disabled }},{{ compliance.defender_running }},{{ compliance.auditd_running }}"
        create: yes

Windows Server Deployment via SCCM

For the Windows-heavy segments of the on-premises estate, SCCM (System Center Configuration Manager) was used as an alternative deployment mechanism alongside Ansible. SCCM provided:

Centralized software distribution with pre-configured packages
Staged deployment to pilot groups before full rollout
Compliance reporting via the SCCM console
Integration with Windows Server Update Services (WSUS) for patch management

The Defender for Endpoint package was distributed via SCCM collection targeting — servers in the "Security Unprotected" collection (automatically populated via compliance baseline evaluation) received the agent package automatically.

This dual-path approach — Ansible for Linux and mixed environments, SCCM for Windows-heavy on-premises — was a deliberate decision to use the most operationally efficient tool for each context rather than forcing a single deployment method across all environments.

Security Operations Integration

Continuous Monitoring

With PIM activation events and Defender for Servers telemetry both flowing into the Log Analytics workspace, a unified monitoring layer was built:

Privileged Access Monitoring

// Privileged role activations with duration and justification
AuditLogs
| where TimeGenerated > ago(7d)
| where Category == "RoleManagement"
| where OperationName contains "Add member to role in PIM completed"
| extend
    RoleName      = tostring(TargetResources[0].displayName),
    ActivatedBy   = tostring(InitiatedBy.user.userPrincipalName),
    Justification = tostring(AdditionalDetails[0].value),
    Duration      = tostring(AdditionalDetails[1].value)
| summarize ActivationCount = count() by ActivatedBy, RoleName, bin(TimeGenerated, 1d)
| order by ActivationCount desc

Defender Coverage Gap Detection

// Servers not reporting to Defender in the last 24 hours
Heartbeat
| where TimeGenerated > ago(24h)
| summarize LastSeen = max(TimeGenerated) by Computer, OSType, ResourceGroup
| where LastSeen < ago(12h)
| project Computer, OSType, ResourceGroup, LastSeen, HoursSinceLastSeen = datetime_diff('hour', now(), LastSeen)
| order by HoursSinceLastSeen desc

Configuration Drift Alert

A Sentinel analytics rule fired when Ansible's compliance report (ingested as a custom log) showed any server with Defender agent in a stopped state:

AnsibleComplianceReport_CL
| where TimeGenerated > ago(25h)  // slight overlap with 24h report cadence
| where defender_running_b == false
| project TimeGenerated, hostname_s, ssh_root_disabled_b, auditd_running_b

Incident Response

For Defender-generated critical alerts, a Logic App playbook was configured to:

Extract the affected server hostname from the Sentinel incident
Check PIM logs to identify any privileged activations on that server in the preceding 4 hours
If a PIM activation correlates with the alert timeline: automatically notify the security team with the activation details and justification provided
If no PIM activation: escalate as a potential unauthorized access incident with higher priority
Create an ITSM ticket with the full context — Defender alert, correlated PIM events, server compliance status

Results

Dimension	Before	After
Standing privileged accounts	Multiple permanent Global Admins	Zero — all converted to eligible with JIT
Average privilege exposure window	Always-on (24/7/365)	Maximum 8 hours per activation, audit-logged
MFA on privileged operations	Not enforced	Mandatory on every role activation
Approval workflow	None	Configured for all Contributor and above
Server security agent coverage	Inconsistent — unknown actual percentage	100% — enforced via policy + Ansible
Time to deploy agent on new server	Days (manual)	Minutes (Ansible playbook on first provision)
Configuration drift detection	None	Weekly automated scan with remediation
Cross-environment visibility	Siloed per hosting platform	Unified in Log Analytics via Azure Arc
Compliance reporting	Manual extraction per audit cycle	Continuous — Defender for Cloud compliance dashboard
Privileged access audit trail	Fragmented AD logs	Centralised PIM audit log with full activation history

Compromising an admin credential now doesn't get an attacker much. The account is eligible but inactive — it still needs MFA, a justification, and in most cases, approval before anything actually happens. That's a fundamentally different risk profile.

The Ansible automation brought the server estate under a consistent baseline for the first time. Weekly drift detection means it can't quietly degrade between audits. The security posture the audit sees is the one that's actually in place.

Background

Anil Choudhary led the design, implementation, and handover end-to-end — covering Azure PIM, Ansible automation, and the Zero Trust model that tied it together.

The Challenge

Privileged Access as a Persistent Attack Surface

In financial services, privileged accounts are the highest-value targets for attackers. The existing access model had several critical weaknesses:

Always-on Global Administrators — multiple accounts with permanent subscription-level admin rights, active whether or not any administrative task was underway
No time-bounding — once a role was assigned, it remained assigned indefinitely with no expiry
No approval workflow — developers and engineers could self-assign elevated permissions in some areas without oversight
No MFA on role activation — privileged operations could be performed without any additional authentication challenge
No audit trail — there was no centralized log of who had elevated access, when they used it, or what actions they took while elevated

Hybrid Infrastructure Complexity

The server estate spanned three environments with different management planes:

Azure VMs — managed through Azure Resource Manager, policy-eligible, Arc-connected where needed
On-premises servers — Windows Server and Linux, managed through legacy tooling, no consistent agent baseline
Secondary cloud workloads — VMs in a non-Azure environment with no current integration into the organization's security monitoring

Ensuring consistent configuration — security agents, hardening baselines, patch levels — across all three required a tool and approach that could operate uniformly regardless of hosting location.

Configuration Drift and Manual Deployment

The existing deployment model was fully manual:

Security agents were installed one server at a time via RDP/SSH
No configuration baseline existed — servers were provisioned and configured differently depending on who did the work
Drift accumulation meant the security posture of any individual server was unknown without connecting to it directly
Scaling security tooling to new servers required manual intervention each time
The time between a server being provisioned and a security agent being installed on it was measured in days, not minutes

For an organization operating under financial services compliance requirements, this was an auditable gap.

No Zero Trust Enforcement

Architecture

Zero Trust Control Mapping

Zero Trust Principle	Implementation
Never trust, always verify	Azure PIM: every privileged action requires activation, MFA, and approval
Use least privilege access	Eligible assignments only — no permanent admin roles
Assume breach	Defender for Servers: continuous vulnerability scanning and threat detection
Verify explicitly	Conditional Access: device compliance + MFA required for all admin sessions
Automate response	Ansible: consistent enforcement across the full estate; Defender playbooks for remediation

Azure Privileged Identity Management

Design Philosophy

Role Assignment Model

Two assignment types were configured, with clear policies on when each applies:

Eligible Assignment (default for all admin roles)
  └── User must explicitly activate the role
  └── Activation requires: MFA challenge + justification text + optional approval
  └── Role is active for a configured duration (max 8 hours)
  └── Role expires automatically — no manual deactivation needed

Active Assignment (exceptions only, requires documented business justification)
  └── Break-glass emergency accounts only
  └── Monitored by Azure Sentinel with immediate alert on any usage
  └── Reviewed quarterly — active assignments not re-justified are reverted to eligible

Permanent Active assignments for operational roles were removed entirely during the implementation. Every operational admin account — including Global Administrator — was converted to Eligible.

Role Configuration by Persona

Role	Assignment Type	Activation Duration	Approval Required	MFA Required
Global Administrator	Eligible	4 hours	Yes — two approvers	Yes
Privileged Role Administrator	Eligible	4 hours	Yes — one approver	Yes
Subscription Owner	Eligible	8 hours	Yes — one approver	Yes
Contributor	Eligible	8 hours	No	Yes
Security Reader	Eligible	8 hours	No	No
Break-glass account	Active (permanent)	N/A	N/A	N/A — monitored

Activation Workflow

When an engineer needs elevated access, the process is:

Navigate to PIM in the Azure portal or trigger activation via PowerShell/CLI
Select the role and specify the duration required (up to the configured maximum)
Provide a justification describing the task (stored in the audit log)
Complete an MFA challenge
If approval is required: wait for an approver to review and approve (approvers are notified via email and Teams)
Role becomes active — the engineer receives notification and the activation is logged
Role expires automatically at the end of the configured duration

# Activating a role via PowerShell (for scripted workflows)
$schedule = New-Object Microsoft.Open.MSGraph.Model.AzureADMSPrivilegedSchedule
$schedule.Type = "Once"
$schedule.Duration = "PT4H"  # 4 hours
$schedule.StartDateTime = (Get-Date).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ")

Open-AzureADMSPrivilegedRoleAssignmentRequest `
  -ProviderId "aadRoles" `
  -ResourceId "<tenant-id>" `
  -RoleDefinitionId "<role-definition-id>" `
  -SubjectId "<user-object-id>" `
  -AssignmentState "Active" `
  -Type "UserAdd" `
  -Reason "Deploying security policy update - incident INC-20231105" `
  -Schedule $schedule

Access Reviews

Quarterly access reviews were configured for all PIM-eligible assignments:

Review owners: team leads for each function (not the security team — ownership sits with the business)
Scope: all eligible and active role assignments
Outcome: assignments not actively re-justified within the review window are automatically removed
Results reported to the compliance team for audit evidence

Audit and Monitoring

All PIM activations, approvals, and denials are logged to Azure AD audit logs and forwarded to the Log Analytics workspace:

// PIM role activations in the last 24 hours
AuditLogs
| where TimeGenerated > ago(24h)
| where OperationName has "Add eligible member to role in PIM completed"
    or OperationName has "Add member to role in PIM completed (permanent)"
| extend
    RoleName = tostring(TargetResources[0].displayName),
    InitiatedBy = tostring(InitiatedBy.user.userPrincipalName),
    Justification = tostring(AdditionalDetails[0].value)
| project TimeGenerated, InitiatedBy, RoleName, Justification, Result
| order by TimeGenerated desc

Workload Protection: Defender for Servers

Deployment Decision

Hybrid Onboarding via Azure Arc

Arc onboarding was scripted and included in the Ansible playbooks so that new servers automatically joined the Arc management plane on first provisioning:

# tasks/arc-onboarding.yml
- name: Download Azure Arc agent
  get_url:
    url: "https://aka.ms/azcmagent-linux"
    dest: /tmp/install_linux_azcmagent.sh
    mode: '0755'

- name: Install Azure Arc agent
  command: /tmp/install_linux_azcmagent.sh
  args:
    creates: /usr/bin/azcmagent

- name: Connect to Azure Arc
  command: >
    azcmagent connect
    --subscription-id "{{ azure_subscription_id }}"
    --resource-group "{{ azure_resource_group }}"
    --tenant-id "{{ azure_tenant_id }}"
    --location "{{ azure_location }}"
    --cloud AzureCloud
  environment:
    AZURE_CLIENT_ID: "{{ arc_service_principal_id }}"
    AZURE_CLIENT_SECRET: "{{ arc_service_principal_secret }}"
  register: arc_connect_result

- name: Verify Arc connection
  command: azcmagent show
  register: arc_status
  failed_when: "'Connected' not in arc_status.stdout"

Server Coverage Matrix

Environment	Onboarding Method	Defender Plan	Agent Type
Azure VMs	Native — Defender for Cloud policy	Plan 2	MMA / AMA
On-premises Windows	Azure Arc	Plan 2	AMA via Arc
On-premises Linux	Azure Arc	Plan 2	AMA via Arc
Secondary cloud VMs	Azure Arc	Plan 2	AMA via Arc

Ansible Automation Framework

Architecture

The Ansible control plane was deployed as a dedicated Linux VM inside the management subnet, accessible only via Azure Bastion. No SSH port was exposed externally.

Management Subnet
  └── Ansible Control Node (Ubuntu 22.04)
        └── SSH key-based authentication (no password auth)
        └── Inventory: dynamic (Azure Resource Graph + on-prem static)
        └── Vault: credentials stored in Ansible Vault, keys in Azure Key Vault

Client Nodes (all environments)
  └── SSH key installed at provisioning time
  └── Dedicated ansible service account (no shell login, sudo for specific commands only)
  └── Firewall rule: inbound SSH from Ansible control node IP only

Inventory Management

A hybrid inventory was built to cover all three environments:

Azure VMs — queried dynamically via the Azure Resource Manager inventory plugin:

# inventory/azure_rm.yml
plugin: azure.azcollection.azure_rm
auth_source: auto
include_vm_resource_groups:
  - rg-production-workloads
  - rg-development
keyed_groups:
  - key: tags.environment
    prefix: env
  - key: tags.os_type
    prefix: os

On-premises and Arc-connected servers — maintained in a static inventory file with group variables:

# inventory/onprem.ini
[linux_servers]
srv-prod-01.internal ansible_host=10.10.1.11
srv-prod-02.internal ansible_host=10.10.1.12
srv-dev-01.internal  ansible_host=10.10.2.11

[windows_servers]
win-prod-01.internal ansible_host=10.10.1.21 ansible_connection=winrm
win-prod-02.internal ansible_host=10.10.1.22 ansible_connection=winrm

[linux_servers:vars]
ansible_user=ansible-svc
ansible_ssh_private_key_file=/etc/ansible/keys/ansible_rsa
ansible_become=true
ansible_become_method=sudo

Core Playbooks

Security Baseline Playbook

Applied to every server on first provisioning and run on a weekly schedule via cron to detect and correct drift:

# playbooks/security-baseline.yml
---
- name: Apply security baseline
  hosts: all
  become: true
  vars_files:
    - ../vault/secrets.yml

  tasks:
    - name: Ensure auditd is installed and running
      package:
        name: auditd
        state: present
      notify: restart auditd

    - name: Disable root SSH login
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^PermitRootLogin'
        line: 'PermitRootLogin no'
        state: present
      notify: restart sshd

    - name: Disable password authentication
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^PasswordAuthentication'
        line: 'PasswordAuthentication no'
        state: present
      notify: restart sshd

    - name: Set login banner
      copy:
        content: |
          Authorized access only. All activity is monitored and logged.
        dest: /etc/issue.net

    - name: Configure SSH to use login banner
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^Banner'
        line: 'Banner /etc/issue.net'
        state: present
      notify: restart sshd

    - name: Ensure unattended-upgrades is configured
      apt:
        name: unattended-upgrades
        state: present
      when: ansible_os_family == "Debian"

    - name: Enable automatic security updates
      debconf:
        name: unattended-upgrades
        question: unattended-upgrades/enable_auto_updates
        vtype: boolean
        value: 'true'
      when: ansible_os_family == "Debian"

  handlers:
    - name: restart sshd
      service:
        name: sshd
        state: restarted

    - name: restart auditd
      service:
        name: auditd
        state: restarted

Defender for Servers Deployment Playbook

The core automation that replaced manual agent installation across the entire server estate:

# playbooks/deploy-defender.yml
---
- name: Deploy Microsoft Defender for Servers agent
  hosts: linux_servers
  become: true
  vars:
    workspace_id: "{{ lookup('azure_keyvault_secret', 'law-workspace-id') }}"
    workspace_key: "{{ lookup('azure_keyvault_secret', 'law-workspace-key') }}"

  tasks:
    - name: Check if OMS agent already installed
      stat:
        path: /opt/microsoft/omsagent/bin/omsagent
      register: oms_installed

    - name: Download OMS agent installer
      get_url:
        url: "https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh"
        dest: /tmp/onboard_agent.sh
        mode: '0755'
      when: not oms_installed.stat.exists

    - name: Install and onboard OMS agent
      command: >
        /tmp/onboard_agent.sh
        -w "{{ workspace_id }}"
        -s "{{ workspace_key }}"
        -d opinsights.azure.com
      when: not oms_installed.stat.exists
      register: oms_install_result

    - name: Verify agent is running
      service:
        name: omsagent
        state: started
        enabled: yes
      register: agent_status

    - name: Confirm agent visible in workspace
      uri:
        url: "https://management.azure.com/subscriptions/{{ azure_subscription_id }}/resourceGroups/{{ resource_group }}/providers/Microsoft.OperationalInsights/workspaces/{{ workspace_name }}/computers?api-version=2020-08-01"
        method: GET
        headers:
          Authorization: "Bearer {{ azure_access_token }}"
      register: workspace_check
      until: workspace_check.json.value | selectattr('name', 'equalto', inventory_hostname) | list | length > 0
      retries: 12
      delay: 30

    - name: Log deployment result
      debug:
        msg: "Defender agent successfully deployed and confirmed on {{ inventory_hostname }}"
      when: workspace_check is succeeded

Drift Detection and Remediation

A weekly scheduled run compared the actual state of each server against the baseline and generated a compliance report:

# playbooks/compliance-check.yml
---
- name: Security compliance check
  hosts: all
  become: true
  gather_facts: true

  tasks:
    - name: Check SSH root login disabled
      command: grep -c "^PermitRootLogin no" /etc/ssh/sshd_config
      register: root_login_check
      changed_when: false
      failed_when: false

    - name: Check Defender agent running
      service_facts:

    - name: Check auditd running
      service_facts:

    - name: Compile compliance status
      set_fact:
        compliance:
          hostname: "{{ inventory_hostname }}"
          ssh_root_disabled: "{{ root_login_check.rc == 0 }}"
          defender_running: "{{ 'omsagent' in services and services['omsagent'].state == 'running' }}"
          auditd_running: "{{ 'auditd' in services and services['auditd'].state == 'running' }}"

    - name: Output compliance report
      delegate_to: localhost
      lineinfile:
        path: /var/log/ansible/compliance-{{ ansible_date_time.date }}.csv
        line: "{{ compliance.hostname }},{{ compliance.ssh_root_disabled }},{{ compliance.defender_running }},{{ compliance.auditd_running }}"
        create: yes

Windows Server Deployment via SCCM

For the Windows-heavy segments of the on-premises estate, SCCM (System Center Configuration Manager) was used as an alternative deployment mechanism alongside Ansible. SCCM provided:

Centralized software distribution with pre-configured packages
Staged deployment to pilot groups before full rollout
Compliance reporting via the SCCM console
Integration with Windows Server Update Services (WSUS) for patch management

Security Operations Integration

Continuous Monitoring

With PIM activation events and Defender for Servers telemetry both flowing into the Log Analytics workspace, a unified monitoring layer was built:

Privileged Access Monitoring

// Privileged role activations with duration and justification
AuditLogs
| where TimeGenerated > ago(7d)
| where Category == "RoleManagement"
| where OperationName contains "Add member to role in PIM completed"
| extend
    RoleName      = tostring(TargetResources[0].displayName),
    ActivatedBy   = tostring(InitiatedBy.user.userPrincipalName),
    Justification = tostring(AdditionalDetails[0].value),
    Duration      = tostring(AdditionalDetails[1].value)
| summarize ActivationCount = count() by ActivatedBy, RoleName, bin(TimeGenerated, 1d)
| order by ActivationCount desc

Defender Coverage Gap Detection

// Servers not reporting to Defender in the last 24 hours
Heartbeat
| where TimeGenerated > ago(24h)
| summarize LastSeen = max(TimeGenerated) by Computer, OSType, ResourceGroup
| where LastSeen < ago(12h)
| project Computer, OSType, ResourceGroup, LastSeen, HoursSinceLastSeen = datetime_diff('hour', now(), LastSeen)
| order by HoursSinceLastSeen desc

Configuration Drift Alert

A Sentinel analytics rule fired when Ansible's compliance report (ingested as a custom log) showed any server with Defender agent in a stopped state:

AnsibleComplianceReport_CL
| where TimeGenerated > ago(25h)  // slight overlap with 24h report cadence
| where defender_running_b == false
| project TimeGenerated, hostname_s, ssh_root_disabled_b, auditd_running_b

Incident Response

For Defender-generated critical alerts, a Logic App playbook was configured to:

Extract the affected server hostname from the Sentinel incident
Check PIM logs to identify any privileged activations on that server in the preceding 4 hours
If a PIM activation correlates with the alert timeline: automatically notify the security team with the activation details and justification provided
If no PIM activation: escalate as a potential unauthorized access incident with higher priority
Create an ITSM ticket with the full context — Defender alert, correlated PIM events, server compliance status

Results

Dimension	Before	After
Standing privileged accounts	Multiple permanent Global Admins	Zero — all converted to eligible with JIT
Average privilege exposure window	Always-on (24/7/365)	Maximum 8 hours per activation, audit-logged
MFA on privileged operations	Not enforced	Mandatory on every role activation
Approval workflow	None	Configured for all Contributor and above
Server security agent coverage	Inconsistent — unknown actual percentage	100% — enforced via policy + Ansible
Time to deploy agent on new server	Days (manual)	Minutes (Ansible playbook on first provision)
Configuration drift detection	None	Weekly automated scan with remediation
Cross-environment visibility	Siloed per hosting platform	Unified in Log Analytics via Azure Arc
Compliance reporting	Manual extraction per audit cycle	Continuous — Defender for Cloud compliance dashboard
Privileged access audit trail	Fragmented AD logs	Centralised PIM audit log with full activation history