Mapping Trust Boundaries in Cloud-Native Apps
Cloud-native architectures have eliminated the static network perimeter, replacing it with ephemeral, dynamically orchestrated workloads where containers, serverless functions, and managed services communicate across transient paths. Without explicit boundary mapping, lateral movement after an initial compromise can reach every service in a namespace. This guide shows you how to discover, enforce, and continuously validate those boundaries in production — as part of the Defining Trust Boundaries discipline inside Threat Modeling Fundamentals & Methodology.
Prerequisites
- A Kubernetes cluster (1.27+) with
kubectlaccess, or an equivalent cloud-managed offering (EKS, GKE, AKE) - Helm 3 installed for deploying Cilium or Istio
opaandconftestCLI tools available in your CI environment- Terraform or equivalent IaC tooling for cloud IAM resources
- Familiarity with attack surface mapping techniques and basic YAML manifests
Expected Outcomes
- A live, version-controlled service dependency graph that reflects runtime topology rather than stale architecture diagrams
- Cryptographic identity (SPIFFE SVIDs) on every workload with mTLS enforced at the mesh layer
- OPA policy gates in CI/CD that block boundary-violating deployments before they reach production
- Automated compliance evidence mapped to SOC 2 CC6.1, OWASP ASVS 1.1.2, and PCI DSS Requirement 1.2.1
Step 1: Discover Live Data Flows with eBPF and OpenTelemetry
Trust boundaries in cloud environments are defined by data flow, not network topology. Accurate mapping requires tracing requests across API gateways, service meshes, event buses, and serverless triggers in real time — not from a whiteboard.
Deploy eBPF Probes
Install Cilium as the CNI, or layer Tetragon on top of an existing CNI, to intercept kernel-level socket operations. eBPF captures process-to-process communication without a sidecar, so it works on serverless node pools where you cannot inject containers.
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--namespace kube-system \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set tetragon.enabled=true
Once running, hubble observe streams every L3/L4/L7 flow with source identity, destination service, and verdict (forwarded or dropped):
hubble observe --follow \
--namespace production \
--output json \
| jq '{src: .source.labels, dst: .destination.labels, verdict: .verdict}'
Pipe this output to a collector (e.g. Loki or Splunk) and build a dependency graph. Commit the graph as JSON to your IaC repository so boundary changes appear in pull requests.
Instrument OpenTelemetry for Application-Layer Flows
eBPF sees network sockets; OpenTelemetry sees application semantics. Inject the OTel SDK into every service and configure trace propagation headers (traceparent, X-B3-TraceId) across HTTP, gRPC, and event payloads:
# values.yaml excerpt for OTel Collector deployment
exporters:
otlp:
endpoint: "tempo.observability:4317"
processors:
batch: {}
attributes:
actions:
- key: trust.zone
from_attribute: k8s.namespace.name
action: insert
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes]
exporters: [otlp]
Aggregate traces in Grafana Tempo or Jaeger. Export service dependency graphs weekly to a version-controlled repository so boundary changes appear in pull request diffs.
Classify Each Discovered Flow
Map each flow to a trust classification before writing policies:
| Classification | Characteristics | Required Controls |
|---|---|---|
| Internal Trusted | Same namespace, shared SPIFFE identity | mTLS enforced, no extra validation |
| Cross-Boundary | Different namespaces or VPCs, external APIs | Token verification, rate limiting, payload sanitisation |
| Ephemeral/Event-Driven | Serverless triggers, message queues | IAM role constraints, event schema validation, source identity check |
Step 2: Enforce Zero-Trust Communication with mTLS and NetworkPolicy
Mapping boundaries is ineffective without cryptographic enforcement. The diagram below shows the target state: every hop carries a verified SPIFFE identity, and NetworkPolicy blocks unapproved paths before packets reach the service mesh.
Issue SPIFFE Identities with SPIRE
Install SPIRE on the Kubernetes cluster and register workload entries so every pod receives a short-lived X.509 SVID:
# Register the order-processor workload
spire-server entry create \
-spiffeID spiffe://prod.example.com/ns/production/sa/order-processor \
-parentID spiffe://prod.example.com/agent/k8s-agent \
-selector k8s:ns:production \
-selector k8s:sa:order-processor \
-ttl 3600
Configure Istio or Linkerd to use SPIRE as the external CA, then set peer authentication to STRICT so plaintext fallback is disabled cluster-wide:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT
Apply NetworkPolicy for Layer 3/4 Enforcement
The NetworkPolicy below restricts the order-processor pod to accept traffic only from the API gateway, and limits egress to the internal CIDR on port 443. This enforces the boundary at the network layer, independent of the service mesh:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: order-processor-boundary
namespace: production
spec:
podSelector:
matchLabels:
app: order-processor
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: api-gateway
ports:
- protocol: TCP
port: 8443
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/8
ports:
- protocol: TCP
port: 443
When handling requests that traverse trust zones, the receiving service must also validate at the application layer: verify the SPIFFE ID or JWT signature against the trusted issuer, check aud and sub claims, reject expired tokens, and log every boundary crossing with a correlation ID.
Step 3: Gate Boundary Drift in CI/CD with Policy-as-Code
Boundary drift occurs when infrastructure changes bypass security review. Embedding policy gates directly into the deployment pipeline prevents unauthorized topology modifications from reaching production.
OPA/Rego Policy for Cross-Namespace Traffic
The following Rego policy validates Kubernetes NetworkPolicy manifests. It denies any policy that includes an implicit allow-all ingress rule, which would collapse a trust boundary silently:
package kubernetes.cross_namespace_policy
import rego.v1
# Default deny cross-namespace traffic
default allow := false
allow if {
input.kind == "NetworkPolicy"
input.spec.ingress[_].from[_].namespaceSelector.matchLabels["kubernetes.io/metadata.name"] == "production"
input.spec.ingress[_].from[_].podSelector.matchLabels["app"] == "api-gateway"
}
deny contains msg if {
input.kind == "NetworkPolicy"
input.spec.policyTypes[_] == "Ingress"
not input.spec.ingress
msg := "Ingress policy missing explicit source selectors — implicit trust boundary violation."
}
CI/CD Pipeline Gate
Wire the policy check into your pipeline so boundary drift is caught at the pull request stage, not post-deployment:
# .github/workflows/boundary-policy.yml
name: Trust Boundary Policy Gate
on: [pull_request]
jobs:
boundary-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Conftest
run: |
curl -L https://github.com/open-policy-agent/conftest/releases/download/v0.51.0/conftest_0.51.0_Linux_x86_64.tar.gz | tar xz
sudo mv conftest /usr/local/bin/
- name: Lint IaC for network segmentation
run: checkov -d ./infra --check CKV_K8S_1,CKV_K8S_8 --compact
- name: Validate NetworkPolicy manifests
run: |
conftest test k8s/network-policies/ \
--policy policy/cross_namespace_policy.rego \
--output table
- name: Validate Terraform IAM boundary
run: |
conftest test infra/iam/ \
--policy policy/iam_boundary.rego \
--output table
Cloud IAM Permission Boundaries for Serverless
Event-driven boundaries in serverless architectures are defined by IAM roles, not persistent connections. Use permission boundaries to cap what a Lambda can do even if its execution role is over-permissive:
resource "aws_iam_role" "lambda_execution" {
name = "order-processor-lambda-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "lambda.amazonaws.com" }
Condition = {
StringEquals = { "aws:RequestedRegion" = "us-east-1" }
}
}]
})
permissions_boundary = aws_iam_policy.lambda_boundary.arn
}
resource "aws_iam_policy" "lambda_boundary" {
name = "order-processor-boundary"
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:us-east-1:*:log-group:/aws/lambda/order-processor-*"
}]
})
}
Verification
After applying the controls above, confirm that boundary enforcement is active using these checks:
# 1. Confirm mTLS is enforced — expect "STRICT" in every row
kubectl get peerauthentication -A -o json \
| jq '.items[] | {ns: .metadata.namespace, mode: .spec.mtls.mode}'
# 2. Confirm NetworkPolicy blocks uninvited traffic
# Run a test pod in a different namespace and attempt to reach order-processor
kubectl run probe --image=busybox --restart=Never -n staging -- \
wget -qO- --timeout=3 http://order-processor.production:8443/health
# Expected: wget: can't connect to remote host — connection refused or timed out
# 3. Verify Hubble flow logs show DROPPED verdict for the probe attempt
hubble observe --namespace production --verdict DROPPED --last 20
# 4. Validate SPIFFE identity issuance
kubectl exec -n production deploy/order-processor -- \
/bin/sh -c 'echo | openssl s_client -connect payment-service:8443 2>/dev/null | openssl x509 -noout -subject'
# Expected: subject=URI:spiffe://prod.example.com/ns/production/sa/payment-service
# 5. Run Conftest against current manifests to confirm no policy regressions
conftest test k8s/network-policies/ \
--policy policy/cross_namespace_policy.rego \
--output table
Compliance evidence collection: export Hubble flow logs and Conftest evaluation reports to immutable storage (AWS S3 Object Lock or GCS object versioning) and reference them in your threat model documentation.
| Framework | Control | Satisfied By |
|---|---|---|
| SOC 2 | CC6.1 | Hubble flow logs, PeerAuthentication STRICT, OPA evaluation records |
| OWASP ASVS | 1.1.2 | Documented data-flow diagram committed to IaC repo, policy-as-code CI gate |
| NIST SP 800-207 | §3.3 | SPIFFE identities, per-request authentication via mTLS |
| PCI DSS | Req 1.2.1 | Explicit-deny NetworkPolicy manifests in version control, CI gate blocks drift |
Troubleshooting
| Symptom | Likely Cause | Diagnosis & Fix |
|---|---|---|
Pods can still reach each other after applying NetworkPolicy |
CNI does not enforce NetworkPolicy (e.g. flannel without Calico overlay) |
Run kubectl get pods -n kube-system and confirm a policy-enforcing CNI (Cilium, Calico, Weave) is present. Replace the CNI or install Calico as a network plugin alongside flannel. |
SPIRE fails to issue SVIDs — context deadline exceeded |
SPIRE Agent cannot reach the SPIRE Server | Check kubectl logs -n spire daemonset/spire-agent. Verify the spire-server service is accessible and that the agent’s trust_domain matches the server config. |
Conftest reports PASS on a manifest that should fail |
Policy file not loaded correctly | Run conftest test --trace and confirm the package name in the .rego file matches the --policy path. Rego package names are not derived from file paths automatically. |
mTLS causes connection reset errors during cert rotation |
In-flight connections dropped at renewal boundary | Set SPIRE SVID TTL to 1 hour minimum and configure Istio gracefulShutdown.minConnectionAge to overlap with the rotation window. Implement retry logic with exponential backoff in services. |
hubble observe shows no flows despite traffic |
Hubble relay not running or Cilium monitor buffer overflow | Check kubectl get pods -n kube-system -l k8s-app=hubble-relay. Increase --monitor-queue-size in the Cilium DaemonSet args if flows are being dropped under load. |
Related
- Defining Trust Boundaries — parent guide covering trust zone theory and threat enumeration
- Automated Attack Surface Discovery with OWASP ZAP — complementary technique for exposing undocumented service endpoints
- How to Apply STRIDE to Microservices Architecture — apply STRIDE threat enumeration to each discovered flow
- Threat Model Documentation Patterns — store boundary maps as living compliance artefacts
- SSRF Prevention and Allowlisting — cross-pillar: SSRF exploits misconfigured egress boundaries directly