Mapping Trust Boundaries in Cloud-Native Apps

Cloud-native architectures have eliminated the static network perimeter, replacing it with ephemeral, dynamically orchestrated workloads where containers, serverless functions, and managed services communicate across transient paths. Without explicit boundary mapping, lateral movement after an initial compromise can reach every service in a namespace. This guide shows you how to discover, enforce, and continuously validate those boundaries in production — as part of the Defining Trust Boundaries discipline inside Threat Modeling Fundamentals & Methodology.

Prerequisites

  • A Kubernetes cluster (1.27+) with kubectl access, or an equivalent cloud-managed offering (EKS, GKE, AKE)
  • Helm 3 installed for deploying Cilium or Istio
  • opa and conftest CLI tools available in your CI environment
  • Terraform or equivalent IaC tooling for cloud IAM resources
  • Familiarity with attack surface mapping techniques and basic YAML manifests

Expected Outcomes

  • A live, version-controlled service dependency graph that reflects runtime topology rather than stale architecture diagrams
  • Cryptographic identity (SPIFFE SVIDs) on every workload with mTLS enforced at the mesh layer
  • OPA policy gates in CI/CD that block boundary-violating deployments before they reach production
  • Automated compliance evidence mapped to SOC 2 CC6.1, OWASP ASVS 1.1.2, and PCI DSS Requirement 1.2.1

Step 1: Discover Live Data Flows with eBPF and OpenTelemetry

Trust boundaries in cloud environments are defined by data flow, not network topology. Accurate mapping requires tracing requests across API gateways, service meshes, event buses, and serverless triggers in real time — not from a whiteboard.

Deploy eBPF Probes

Install Cilium as the CNI, or layer Tetragon on top of an existing CNI, to intercept kernel-level socket operations. eBPF captures process-to-process communication without a sidecar, so it works on serverless node pools where you cannot inject containers.

helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set tetragon.enabled=true

Once running, hubble observe streams every L3/L4/L7 flow with source identity, destination service, and verdict (forwarded or dropped):

hubble observe --follow \
  --namespace production \
  --output json \
  | jq '{src: .source.labels, dst: .destination.labels, verdict: .verdict}'

Pipe this output to a collector (e.g. Loki or Splunk) and build a dependency graph. Commit the graph as JSON to your IaC repository so boundary changes appear in pull requests.

Instrument OpenTelemetry for Application-Layer Flows

eBPF sees network sockets; OpenTelemetry sees application semantics. Inject the OTel SDK into every service and configure trace propagation headers (traceparent, X-B3-TraceId) across HTTP, gRPC, and event payloads:

# values.yaml excerpt for OTel Collector deployment
exporters:
  otlp:
    endpoint: "tempo.observability:4317"
processors:
  batch: {}
  attributes:
    actions:
      - key: trust.zone
        from_attribute: k8s.namespace.name
        action: insert
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [otlp]

Aggregate traces in Grafana Tempo or Jaeger. Export service dependency graphs weekly to a version-controlled repository so boundary changes appear in pull request diffs.

Classify Each Discovered Flow

Map each flow to a trust classification before writing policies:

Classification Characteristics Required Controls
Internal Trusted Same namespace, shared SPIFFE identity mTLS enforced, no extra validation
Cross-Boundary Different namespaces or VPCs, external APIs Token verification, rate limiting, payload sanitisation
Ephemeral/Event-Driven Serverless triggers, message queues IAM role constraints, event schema validation, source identity check

Step 2: Enforce Zero-Trust Communication with mTLS and NetworkPolicy

Mapping boundaries is ineffective without cryptographic enforcement. The diagram below shows the target state: every hop carries a verified SPIFFE identity, and NetworkPolicy blocks unapproved paths before packets reach the service mesh.

Cloud-native trust boundary enforcement architecture Diagram showing three trust zones: untrusted ingress, trusted production namespace, and external SaaS. Arrows labelled with mTLS and SPIFFE show permitted flows; a red dashed line shows the blocked cross-boundary path without authorisation. Untrusted Ingress Internet Client no SVID API Gateway JWT termination Trust Boundary NetworkPolicy + OPA Trusted: production ns order-processor SVID: spiffe://prod/order payment-service SVID: spiffe://prod/payment SPIRE Agent issues X.509 SVIDs External SaaS API egress: port 443 only JWT verify mTLS mTLS Direct pod access BLOCKED by NetworkPolicy

Issue SPIFFE Identities with SPIRE

Install SPIRE on the Kubernetes cluster and register workload entries so every pod receives a short-lived X.509 SVID:

# Register the order-processor workload
spire-server entry create \
  -spiffeID spiffe://prod.example.com/ns/production/sa/order-processor \
  -parentID spiffe://prod.example.com/agent/k8s-agent \
  -selector k8s:ns:production \
  -selector k8s:sa:order-processor \
  -ttl 3600

Configure Istio or Linkerd to use SPIRE as the external CA, then set peer authentication to STRICT so plaintext fallback is disabled cluster-wide:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

Apply NetworkPolicy for Layer 3/4 Enforcement

The NetworkPolicy below restricts the order-processor pod to accept traffic only from the API gateway, and limits egress to the internal CIDR on port 443. This enforces the boundary at the network layer, independent of the service mesh:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: order-processor-boundary
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: order-processor
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api-gateway
      ports:
        - protocol: TCP
          port: 8443
  egress:
    - to:
        - ipBlock:
            cidr: 10.0.0.0/8
      ports:
        - protocol: TCP
          port: 443

When handling requests that traverse trust zones, the receiving service must also validate at the application layer: verify the SPIFFE ID or JWT signature against the trusted issuer, check aud and sub claims, reject expired tokens, and log every boundary crossing with a correlation ID.


Step 3: Gate Boundary Drift in CI/CD with Policy-as-Code

Boundary drift occurs when infrastructure changes bypass security review. Embedding policy gates directly into the deployment pipeline prevents unauthorized topology modifications from reaching production.

OPA/Rego Policy for Cross-Namespace Traffic

The following Rego policy validates Kubernetes NetworkPolicy manifests. It denies any policy that includes an implicit allow-all ingress rule, which would collapse a trust boundary silently:

package kubernetes.cross_namespace_policy

import rego.v1

# Default deny cross-namespace traffic
default allow := false

allow if {
    input.kind == "NetworkPolicy"
    input.spec.ingress[_].from[_].namespaceSelector.matchLabels["kubernetes.io/metadata.name"] == "production"
    input.spec.ingress[_].from[_].podSelector.matchLabels["app"] == "api-gateway"
}

deny contains msg if {
    input.kind == "NetworkPolicy"
    input.spec.policyTypes[_] == "Ingress"
    not input.spec.ingress
    msg := "Ingress policy missing explicit source selectors — implicit trust boundary violation."
}

CI/CD Pipeline Gate

Wire the policy check into your pipeline so boundary drift is caught at the pull request stage, not post-deployment:

# .github/workflows/boundary-policy.yml
name: Trust Boundary Policy Gate
on: [pull_request]

jobs:
  boundary-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Conftest
        run: |
          curl -L https://github.com/open-policy-agent/conftest/releases/download/v0.51.0/conftest_0.51.0_Linux_x86_64.tar.gz | tar xz
          sudo mv conftest /usr/local/bin/

      - name: Lint IaC for network segmentation
        run: checkov -d ./infra --check CKV_K8S_1,CKV_K8S_8 --compact

      - name: Validate NetworkPolicy manifests
        run: |
          conftest test k8s/network-policies/ \
            --policy policy/cross_namespace_policy.rego \
            --output table

      - name: Validate Terraform IAM boundary
        run: |
          conftest test infra/iam/ \
            --policy policy/iam_boundary.rego \
            --output table

Cloud IAM Permission Boundaries for Serverless

Event-driven boundaries in serverless architectures are defined by IAM roles, not persistent connections. Use permission boundaries to cap what a Lambda can do even if its execution role is over-permissive:

resource "aws_iam_role" "lambda_execution" {
  name = "order-processor-lambda-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
      Condition = {
        StringEquals = { "aws:RequestedRegion" = "us-east-1" }
      }
    }]
  })

  permissions_boundary = aws_iam_policy.lambda_boundary.arn
}

resource "aws_iam_policy" "lambda_boundary" {
  name = "order-processor-boundary"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ]
      Resource = "arn:aws:logs:us-east-1:*:log-group:/aws/lambda/order-processor-*"
    }]
  })
}

Verification

After applying the controls above, confirm that boundary enforcement is active using these checks:

# 1. Confirm mTLS is enforced — expect "STRICT" in every row
kubectl get peerauthentication -A -o json \
  | jq '.items[] | {ns: .metadata.namespace, mode: .spec.mtls.mode}'

# 2. Confirm NetworkPolicy blocks uninvited traffic
# Run a test pod in a different namespace and attempt to reach order-processor
kubectl run probe --image=busybox --restart=Never -n staging -- \
  wget -qO- --timeout=3 http://order-processor.production:8443/health
# Expected: wget: can't connect to remote host — connection refused or timed out

# 3. Verify Hubble flow logs show DROPPED verdict for the probe attempt
hubble observe --namespace production --verdict DROPPED --last 20

# 4. Validate SPIFFE identity issuance
kubectl exec -n production deploy/order-processor -- \
  /bin/sh -c 'echo | openssl s_client -connect payment-service:8443 2>/dev/null | openssl x509 -noout -subject'
# Expected: subject=URI:spiffe://prod.example.com/ns/production/sa/payment-service

# 5. Run Conftest against current manifests to confirm no policy regressions
conftest test k8s/network-policies/ \
  --policy policy/cross_namespace_policy.rego \
  --output table

Compliance evidence collection: export Hubble flow logs and Conftest evaluation reports to immutable storage (AWS S3 Object Lock or GCS object versioning) and reference them in your threat model documentation.

Framework Control Satisfied By
SOC 2 CC6.1 Hubble flow logs, PeerAuthentication STRICT, OPA evaluation records
OWASP ASVS 1.1.2 Documented data-flow diagram committed to IaC repo, policy-as-code CI gate
NIST SP 800-207 §3.3 SPIFFE identities, per-request authentication via mTLS
PCI DSS Req 1.2.1 Explicit-deny NetworkPolicy manifests in version control, CI gate blocks drift

Troubleshooting

Symptom Likely Cause Diagnosis & Fix
Pods can still reach each other after applying NetworkPolicy CNI does not enforce NetworkPolicy (e.g. flannel without Calico overlay) Run kubectl get pods -n kube-system and confirm a policy-enforcing CNI (Cilium, Calico, Weave) is present. Replace the CNI or install Calico as a network plugin alongside flannel.
SPIRE fails to issue SVIDs — context deadline exceeded SPIRE Agent cannot reach the SPIRE Server Check kubectl logs -n spire daemonset/spire-agent. Verify the spire-server service is accessible and that the agent’s trust_domain matches the server config.
Conftest reports PASS on a manifest that should fail Policy file not loaded correctly Run conftest test --trace and confirm the package name in the .rego file matches the --policy path. Rego package names are not derived from file paths automatically.
mTLS causes connection reset errors during cert rotation In-flight connections dropped at renewal boundary Set SPIRE SVID TTL to 1 hour minimum and configure Istio gracefulShutdown.minConnectionAge to overlap with the rotation window. Implement retry logic with exponential backoff in services.
hubble observe shows no flows despite traffic Hubble relay not running or Cilium monitor buffer overflow Check kubectl get pods -n kube-system -l k8s-app=hubble-relay. Increase --monitor-queue-size in the Cilium DaemonSet args if flows are being dropped under load.