Automated Attack Surface Discovery with OWASP ZAP: CI/CD Integration & Compliance Workflows

Modern web architectures expose endpoints faster than any manual inventory can keep pace with. OWASP ZAP, deployed as an automated discovery engine inside a DevSecOps pipeline, continuously maps that expanding surface and flags regressions the moment they appear. This guide is part of Attack Surface Mapping Techniques, which sits within the broader Threat Modeling Fundamentals & Methodology practice. Where relevant, this page also cross-references defining trust boundaries for scope enforcement and the STRIDE framework for classifying what ZAP uncovers.

Prerequisites

  • OWASP ZAP 2.14+ (Docker image ghcr.io/zaproxy/zaproxy:stable or local install)
  • A target application accessible from the CI runner (Docker Compose or a dedicated staging slot)
  • A CI/CD system with secrets management (GitHub Actions, GitLab CI, or equivalent)
  • Python 3.10+ for API scripting; jq for report post-processing
  • A dedicated CI service account with least-privilege access to the target app

Expected Outcomes

  • ZAP context configured with strict inclusion/exclusion rules and authenticated sessions
  • Baseline DAST scan running on every pull request, uploading SARIF to the repository’s Security tab
  • SPA routes, GraphQL endpoints, and WebSocket upgrades included in the discovered attack surface
  • Compliance evidence JSON mapped to SOC 2, OWASP ASVS, and PCI DSS controls, ready for auditors
  • Discovered endpoints diffed against the threat model registry with automated ticket creation on drift

Step 1: Define ZAP Scan Scope and Authentication Contexts

Dynamic scanning without explicit boundaries generates false positives, violates rate limits, and risks data corruption. ZAP contexts enforce strict inclusion/exclusion rules and define authentication lifecycles. Align these boundaries with the trust boundaries already established in your architecture documentation so the scanner never crosses into a zone it does not own.

ZAP contexts are imported via the REST API or CLI. The JSON below defines a strict scope, JWT injection, and session validation indicators.

{
  "context": {
    "name": "production-api-scope",
    "description": "Authenticated API surface for CI/CD baseline",
    "inScope": true,
    "urls": [
      "https://api\\.example\\.com/.*",
      "https://app\\.example\\.com/.*"
    ],
    "excludeFromScan": [
      "https://api\\.example\\.com/health",
      "https://api\\.example\\.com/admin/.*",
      "https://cdn\\.thirdparty\\.com/.*"
    ],
    "authentication": {
      "type": "json",
      "method": "POST",
      "loginUrl": "https://api.example.com/v1/auth/login",
      "loginRequestData": "{\"email\":\"{%username%}\",\"password\":\"{%password%}\"}",
      "loggedInIndicator": "\"status\":\"authenticated\"",
      "loggedOutIndicator": "\"status\":\"unauthorized\""
    },
    "users": [
      {
        "name": "test-scanner-user",
        "credentials": {
          "username": "[email protected]",
          "password": "${ZAP_SCANNER_PASSWORD}"
        }
      }
    ],
    "sessionManagement": {
      "type": "cookieBasedSessionManagement",
      "parameters": {
        "cookieName": "session_id"
      }
    }
  }
}

Security boundaries to enforce at this step:

  • Never scan /admin, /internal, or third-party CDN paths.
  • Use isolated CI service accounts with least-privilege RBAC — this applies the same principle as injection attack prevention where reducing blast radius limits damage from any tooling misconfiguration.
  • Enforce excludeFromScan regexes to block destructive endpoints such as DELETE /v1/users/*.
  • Rotate credentials via CI secret managers; never hardcode credentials in the context file.

Step 2: CI/CD Pipeline Integration and Baseline Gating

ZAP Baseline Scan CI/CD Flow Diagram showing a pull request triggering a GitHub Actions workflow that spins up the app, runs ZAP baseline scan, uploads SARIF to GitHub Security tab, and optionally blocks mainline merges on High findings. Pull Request / Mainline push Start App docker compose ZAP Baseline action-baseline Upload SARIF Security tab (always) Block on High mainline only

Baseline scans identify obvious misconfigurations and run fast enough to gate every pull request. The workflow below is non-blocking on PRs but fails hard on critical findings in mainline branches.

name: ZAP Baseline DAST Scan
on:
  pull_request:
    branches: [main, develop]
  push:
    branches: [main]

jobs:
  zap-baseline:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Start Target App
        run: docker compose -f docker-compose.test.yml up -d --wait

      - name: Run ZAP Baseline Scan
        uses: zaproxy/action-[email protected]
        with:
          target: 'http://localhost:8080'
          cmd_options: '-a -j -r zap-baseline.html -d'
          allow_issue_writing: false
          fail_action: ${{ github.ref == 'refs/heads/main' }}
          rules_file_name: '.zap/rules.tsv'
          token: ${{ secrets.GITHUB_TOKEN }}
          artifact_name: 'zap-baseline-report'

      - name: Upload SARIF Report
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: 'zap-baseline.sarif'

Pipeline gating strategy:

  • fail_action evaluates to false on PRs so merges are not blocked, but alerts appear in the GitHub Security tab for developer review.
  • On main pushes, fail_action resolves to true, blocking promotion on any High or Critical finding.
  • Use .zap/rules.tsv to tune alert severity thresholds per environment — suppress known false positives by alert ID rather than by risk category to avoid masking real issues.

Step 3: Handle SPAs, GraphQL, and WebSocket Surfaces

Traditional link-following crawlers fail on client-side routing, GraphQL mutations, and WebSocket upgrade requests. ZAP requires explicit configuration to map these surfaces, otherwise entire attack vectors are invisible to the scanner — a gap that compounds when you apply the STRIDE framework to assess what those missed surfaces expose.

AJAX Spider for SPA Route Discovery

Hash-based (#route) and History API (/route) endpoints require DOM parsing. The Python script below uses the ZAP API to force route discovery and queue discovered routes for active scanning.

#!/usr/bin/env python3
"""
ZAP Standalone Script: SPA Route Discovery & AJAX Spider Trigger
Run via: python spa_crawler.py
Requires: OWASP ZAP running on ZAP_URL with ZAP_API_KEY set
"""
import requests
import time

ZAP_API_KEY = "changeme"
ZAP_URL = "http://localhost:8080"
TARGET = "https://app.example.com"

def configure_and_run_ajax_spider():
    headers = {"X-ZAP-API-Key": ZAP_API_KEY}

    # Configure AJAX Spider for SPA wait times and route extraction
    requests.post(
        f"{ZAP_URL}/JSON/ajaxSpider/action/setOptionBrowserId/",
        params={"String": "firefox-headless"},
        headers=headers,
    )
    requests.post(
        f"{ZAP_URL}/JSON/ajaxSpider/action/setOptionMaxDuration/",
        params={"Integer": "15"},
        headers=headers,
    )
    requests.post(
        f"{ZAP_URL}/JSON/ajaxSpider/action/setOptionEventWait/",
        params={"Integer": "500"},
        headers=headers,
    )

    # Trigger crawl
    resp = requests.get(
        f"{ZAP_URL}/JSON/ajaxSpider/action/scan/",
        params={"url": TARGET, "inScopeOnly": "true"},
        headers=headers,
    )
    print(f"Spider started: {resp.json()}")

    # Poll until completion
    while True:
        status = requests.get(
            f"{ZAP_URL}/JSON/ajaxSpider/view/status/", headers=headers
        ).json()
        if status.get("status") == "stopped":
            break
        time.sleep(5)

    # Extract discovered URLs and queue for active scan
    results = requests.get(
        f"{ZAP_URL}/JSON/ajaxSpider/view/results/", headers=headers
    ).json()
    urls = results.get("results", [])
    for url in urls:
        requests.post(
            f"{ZAP_URL}/JSON/ascan/action/scan/",
            params={"url": url, "recurse": "false"},
            headers=headers,
        )
    print(f"Active scan queued for {len(urls)} SPA routes.")

if __name__ == "__main__":
    configure_and_run_ajax_spider()

GraphQL and WebSocket Configuration

Surface ZAP Configuration Key Risk
GraphQL introspection Add Content-Type: application/json header; seed {"query":"{__schema{types{name}}}"} as the initial request body Schema enumeration exposes all types and mutations to an attacker
WebSocket Enable via Options > WebSocket > Enable WebSocket Support; add custom active scan scripts for frame fuzzing Real-time data exfiltration; message injection
Server-Sent Events Passive scan only; record stream URLs manually and add to context Persistent data leakage from event channels
Service mesh internal traffic Exclude *.svc.cluster.local via regex; scan only ingress controller URLs Avoids scope creep into internal service-to-service channels

Cross-site scripting vulnerabilities are often first surfaced by ZAP’s reflected-content checks — see XSS mitigation patterns for the corresponding remediation guidance once ZAP alerts on these.

Step 4: Compliance Evidence Generation and Threat Model Synchronization

Compliance Mapping

ZAP alert IDs carry structured metadata. The table below maps high-value alerts to the audit controls you are most likely to be assessed against.

ZAP Alert ID Vulnerability SOC 2 CC6.1 OWASP ASVS 4.0 PCI DSS 6.3.2
10010 Cookie Without Secure Flag Access Control V3.4.1 Secure Coding
10011 Cookie Without HttpOnly Flag Data Protection V3.4.2 Secure Coding
10020 X-Frame-Options Header Missing System Integrity V14.4.7 Secure Coding
40012 Cross-Site Scripting (Reflected) Input Validation V5.3.3 Secure Coding
40014 Cross-Site Scripting (Persistent) Input Validation V5.3.3 Secure Coding
90033 Loosely Scoped Cookie Data Protection V3.4.5 Secure Coding

Automated Evidence Export

# Export JSON report with full alert metadata
curl -s "http://localhost:8080/JSON/core/view/alerts/?apikey=${ZAP_API_KEY}&baseurl=https://api.example.com" \
  > zap-alerts.json

# Filter High and Medium alerts; map to compliance controls
jq '[.alerts[] | select(.risk == "High" or .risk == "Medium") |
  {
    alert_id: .id,
    risk: .risk,
    control: (
      if .id == "10010" then "SOC2_CC6.1 / ASVS_V3.4.1"
      elif .id == "40012" then "PCI_DSS_6.3.2 / ASVS_V5.3.3"
      else "ISO_27001_A14 / ASVS_V14"
      end
    ),
    evidence: .other,
    remediation: .solution
  }]' zap-alerts.json > compliance-evidence.json

Store compliance-evidence.json in an immutable artifact repository (AWS S3 with Object Lock or Azure Blob WORM storage) to satisfy auditor retention requirements.

Threat Model Drift Detection

Static threat models diverge from the live application as code evolves. The diff pipeline below compares ZAP-discovered endpoints against a Git-tracked threat-model.json and raises a ticket on any unregistered surface. This enforces the continuous validation loop described in threat model documentation patterns.

# Extract ZAP-discovered endpoints after spider completes
curl -s "http://localhost:8080/JSON/spider/view/results/?apikey=${ZAP_API_KEY}&scanId=0" \
  | jq -r '.results[]' | sort -u > discovered_endpoints.txt

# Extract expected endpoints from the threat model registry
jq -r '.endpoints[].path' threat-model.json | sort -u > expected_endpoints.txt

# Identify net-new endpoints (present in scan but missing from model)
NEW=$(comm -13 expected_endpoints.txt discovered_endpoints.txt)
if [ -n "$NEW" ]; then
  echo "DRIFT DETECTED — unregistered endpoints found:"
  echo "$NEW"
  # Raise a GitHub issue automatically
  gh issue create \
    --title "Attack surface drift: unregistered endpoints detected" \
    --body "$(echo "$NEW" | sed 's/^/- /')" \
    --label "security,p2"
fi

If expected endpoints are absent from scan results, validate authentication context or routing configuration before assuming they were removed.

Verification

After running a full scan, confirm the pipeline is working correctly:

# 1. Confirm ZAP discovered the expected number of URLs
curl -s "http://localhost:8080/JSON/spider/view/results/?apikey=${ZAP_API_KEY}&scanId=0" \
  | jq '.results | length'

# 2. Confirm no High alerts are present (exit 1 if any found)
HIGH=$(curl -s "http://localhost:8080/JSON/core/view/alerts/?apikey=${ZAP_API_KEY}" \
  | jq '[.alerts[] | select(.risk == "High")] | length')
echo "High alerts: $HIGH"
[ "$HIGH" -eq 0 ] || exit 1

# 3. Verify SARIF output contains expected alert categories
jq '.runs[0].results | length' zap-baseline.sarif

# 4. Confirm compliance evidence file was generated and is non-empty
[ -s compliance-evidence.json ] && echo "Evidence file OK" || echo "Evidence file MISSING"

Expected baseline output for a healthy staging environment: zero High alerts, compliance-evidence.json with at least one mapped entry, and the SARIF file uploaded successfully to the GitHub Security tab under the PR.

Troubleshooting

Failure Mode Diagnosis Fix
ZAP reports 0 discovered URLs Authentication context misconfigured; scanner is hitting login redirect on every request Test loggedInIndicator regex against a real auth response using curl; click the “Test” button in ZAP’s Authentication panel before CI runs
AJAX Spider finds no SPA routes eventWait too short for slow JS rendering; Firefox headless not installed on runner Increase eventWait to 1000ms; add --shm-size=2g to the Docker run flags; confirm Firefox is available in the CI image
High false-positive rate on third-party domains excludeFromScan regexes not anchored correctly Prefix each regex with https:// and suffix with /.*; validate using ZAP’s Context editor before committing
SARIF upload fails with “file not found” zaproxy/action-baseline only generates SARIF when -j flag is present Confirm cmd_options includes -j; check the runner’s working directory matches the SARIF path in the upload step
Compliance evidence JSON is empty ZAP found no alerts matching the select(.risk == "High" or .risk == "Medium") filter Lower the filter to include "Low" for an initial run to confirm the pipeline is functional; then investigate why higher-risk alerts are absent