How We Helped a Healthcare Client With Kubernetes HIPAA Compliance
Last Updated on May 8, 2026
Quick Summary
This insight covers how Bacancy delivered a Kubernetes HIPAA Compliance project for a US healthcare SaaS client running clinical workloads on Amazon EKS. It includes the pre-project audit, the three-layer architecture we put in place, the testing process we followed, and the results from 90 days after the cluster went live.
Table of Contents
Introduction
Around September 2025, a US-based healthcare SaaS company reached out to us. Their product is a patient management system used by small and mid-sized clinics for patient intake, charting, and care coordination. It runs on Amazon EKS, with about 50 microservices spread over four namespaces and roughly 2 million patient records in the system.
The application worked perfectly well. Also, the prerequisites for AWS HIPAA compliance were already in place. The BAA was signed, only HIPAA-eligible AWS services were in use, and foundational encryption and access controls were configured at the AWS account level. The issue was with HIPAA compliance in the Kubernetes cluster, where the application’s PHI workloads actually run.
An internal HIPAA audit completed two months earlier had flagged 11 controls in the Security Rule as “addressable but not implemented” on their cluster. None of those gaps were a problem under the existing HIPAA Security Rule, where addressable controls allow consideration of cost and risk. But, the Notice of Proposed Rulemaking, published January 6, 2025, by the Office for Civil Rights changes that. The proposed update removes the addressable distinction and makes those 11 controls mandatory, with a 240-day compliance window after the final rule publishes.
Their cyber insurance company had separately called out the gaps during renewal and even asked for documented evidence of the controls before finalizing the policy.
Between the rule change and the insurer’s renewal terms, the client weighed three alternatives before calling us:
They could move from EKS to AWS Fargate with ECS. That shifts a portion of the HIPAA controls to AWS, but it meant rewriting deployments and CI/CD for all 50 services.
They could pay for a Kubernetes compliance platform like Wiz or Aqua to automate policy checks and evidence collection on the existing cluster. But the gaps these tools flag would still need someone to close them.
They could use their in-house DevOps team to close the controls one at a time over six months, but the compliance officer wanted documented evidence well before that.
They came to us instead, and we took the work on a 12-week timeline.
This insight covers what the Kubernetes HIPAA compliance pre-project audit found, the architecture we chose, how we tested it, and the numbers the client measured 90 days after deployment.
Results From the Audit Process Before We Started With the Kubernetes HIPAA Compliance Project
Before starting the Kubernetes HIPAA compliance project, we ran a full audit of the cluster, the workloads, and every HIPAA Security Rule control, using our standard HIPAA compliance checklist as the reference. Here are the four findings that shaped the rest of the project.
1. etcd Encryption Was at the Volume Layer Only
EKS encrypts the EBS volumes hosting etcd by default. That protects against physical disk theft, which is a control plane concern under the AWS shared responsibility model. But it does not defend against an attacker who already has cluster admin, because by the time the request reaches etcd, the operating system has already decrypted the data. And, the cluster had no second layer of encryption inside etcd itself.
A cluster-admin token was all it took. We ran kubectl get secret -o yaml against the four PHI namespaces and pulled the values of 47 Kubernetes Secrets back as base64, which is encoding, not encryption: database connection strings for two RDS instances holding PHI, an Okta client secret used for clinician SSO, and OAuth tokens for the EHR sync service. Anyone reaching the cluster admin would reach PHI within minutes.
2. 23 HIPAA Security Rule Controls Mapped to K8s Objects, 11 Were Addressable-Only
We documented every technical safeguard in §164.312 along with the relevant administrative and physical safeguards from §164.308 and §164.310, and matched each one to the Kubernetes object responsible for it:
RBAC mapped to access control.
NetworkPolicies mapped to integrity and segmentation controls.
API server audit logs mapped to audit controls.
PV encryption mapped to encryption at rest.
23 controls applied to the cluster. 12 were properly implemented: TLS at the ingress, RBAC for two of the four namespaces, basic API server audit logging, and EBS encryption on PVs.
The remaining 11 were addressable-only:
MFA on cluster admin paths (engineers used personal AWS IAM keys with no enforced second factor)
Application-layer encryption of ePHI in etcd
Audit log retention beyond 7 days
Network segmentation between PHI and non-PHI namespaces
Vulnerability scanning on a defined frequency
Pod Security Standards enforcement
Per-pod workload identity (services authenticated through a shared IAM role)
Egress controls
Image provenance and admission policy
Secret rotation frequency
Backup encryption with separately managed keys
Under the HIPAA Security Rule, addressable controls give the client three options: implement them, put an equivalent alternative in place, or document in writing why the control isn’t reasonable for the environment. The client had done none of these.
Closing these 11 controls was the next priority task added to our Kubernetes HIPAA compliance checklist for this project. Each went into a spreadsheet linked to its Security Rule citation, the relevant Kubernetes object, the owner, and the verification method.
The client’s compliance officer (the same person whose internal audit had flagged the gaps two months earlier) reviewed the plan and approved it. From there, we started work on the cluster.
3. PHI Flow Mapped Through 4 Namespaces and 17 Services
The third audit task in our Kubernetes HIPAA compliance project was tracing every place PHI lived, moved, or left the EKS cluster. Here’s what we found:
Four namespaces stored PHI: clinical-api, imaging-svc, notifications, and analytics-warehouse.
Seventeen services ran across those four namespaces, eleven of which read or wrote PHI directly.
Three storage classes carried PHI on disk: gp3 for primary clinical data, io2 for imaging, and an EFS-backed class for shared documents.
There were six entry points for PHI to enter the cluster:
the FHIR API at the ingress,
the EHR webhook handler,
internal service-to-service calls,
a scheduled overnight import job,
the image upload endpoint,
and an audit replay tool.
And four exit points to exit the cluster:
the outbound EHR sync,
the push notification service,
the Prometheus metrics pipeline,
and the scheduled data export to the analytics warehouse.
Every entry and exit became a verification point in the testing phase, further in the project. If PHI moved through a path before the project, an equivalent path had to exist after, with encryption, audit, and access control matching or exceeding the original.
4. API Server Audit Log Retention Was 7 Days
The HIPAA Security Rule requires an audit trail of which user accessed which patient’s record, what they did with it, and when. The client met that requirement at the cluster level: with the API server audit logs stored in CloudWatch with 7-day retention.
But the recent Notice of Proposed Rulemaking, which came in January 2025, raises the bar. It requires the organization to determine retention through a documented risk analysis rather than a default. HIPAA’s existing documentation rule (§164.316(b)(2)(i)) already pushes the industry benchmark for audit-related records to six years, and 7 days falls well short of that benchmark with no documented analysis to justify it.
There was a second gap. The kube-audit pipeline only recorded who ran the kubectl commands, not the clinicians reading patient records inside the application. A nurse opening a chart through the clinical UI does not show up in the API server audit logs. The application had its own audit table, but it was incomplete. Three of the 11 PHI-reading services logged writes only and skipped reads entirely.
The Kubernetes HIPAA Compliance Architecture We Chose
With the audit complete, we structured the Kubernetes HIPAA compliance work in three layers. Each layer had to pass its own verification before the next started.
1. Encryption Controls
The first layer covered encryption in two states:
Data at rest (Secrets in etcd, persistent volumes on disk) and
Data in motion (traffic between services in the cluster).
We enabled application-layer encryption for etcd using the AWS KMS provider for Kubernetes, with a dedicated KMS key per environment. The 47 existing Secrets had to be re-encrypted under the new keys. We read each value, wrote it back through the API server so it would land encrypted under the new key, and verified by pulling the same Secrets back through the API server. The values were now encrypted under the new KMS key and were only readable through the kube-apiserver decrypt path.
Persistent volumes moved to KMS-backed encryption with a separate key per namespace. A Lambda function rotation was set for each key every 90 days, which will post a CloudWatch alert if the rotation fails.
Static Kubernetes Secrets were replaced with the External Secrets Operator pulling from AWS Secrets Manager. The rotation schedule is set at the Secrets Manager layer, which means a rotation triggers a pod restart rather than a code change.
For internal traffic between services, we deployed Linkerd as the service mesh and let it manage automatic mTLS for every pod in PHI namespaces. We also set up Cert-manager to handle the certificate authority. The result: no service in a PHI namespace can talk to another without a valid mTLS handshake.
2. Access Control and Audit Logging
We removed every personal AWS IAM key from the kubectl access path and put AWS Cognito in front as the OIDC provider. We turned on MFA at the Cognito user pool with TOTP as the second factor, and configured WebAuthn passkeys with user verification to also satisfy MFA requirements. Engineers can now authenticate to Cognito, receive a short-lived token, and the API server validates that token.
We rebuilt Kubernetes RBAC around nine roles that the compliance officer defined:
clinician-read,
clinician-write,
scribe,
billing,
cluster-admin,
audit-readonly,
sre-readonly,
sre-break-glass, and
ml-engineer-pseudonymized.
Each role has explicit RoleBindings per namespace. The sre-break-glass role exists for emergencies only, and triggers a Slack alert and a Splunk audit row whenever it is used.
Next, we deployed OPA Gatekeeper with five policies that block deploys outright if violated:
No privileged containers in PHI namespaces
All images must come from the approved ECR registry
Every pod must declare a securityContext with runAsNonRoot: true and a read-only root filesystem
Every Service in a PHI namespace must have a corresponding NetworkPolicy
All pods must define resource requests and limits
We expanded the API server audit policy from the default Metadata level to RequestResponse for PHI namespaces, which captures the full request and response body on every API call. We deployed Fluent Bit to ship those logs to Splunk, increasing the retention period from 7 days to 365 days, and a parallel pipeline writes the same logs to S3 with Object Lock for the full six-year HIPAA documentation window.
3. Network Segmentation and Runtime Defense
The third layer covered traffic inside the cluster, traffic leaving the cluster, and what runs inside containers once they’re up.
We installed Calico as the CNI and applied a NetworkPolicy in every PHI namespace that blocks all pod-to-pod traffic by default. We then wrote allow rules naming the specific source pods, destination pods, and ports each service needs to function. Any connection attempt that doesn’t match an allow rule will get dropped before it reaches the destination pod.
We also wrote a Calico GlobalNetworkPolicy for outbound traffic. The traffic from PHI namespaces can now only reach four destinations:
the EHR partner’s API endpoint,
the Splunk SIEM,
AWS KMS,
and the AWS API endpoints needed for IAM and Secrets Manager.
Every other outbound connection is blocked at the pod network layer.
We deployed Falco for runtime threat detection on every node running PHI workloads, and we configured the rules for this workload to alert on four kinds of events:
a shell session starting inside a running pod,
a pod making an outbound connection to a destination outside the four Calico permits,
a process writing to a system directory inside a container,
and a service reading a file tagged as PHI without a corresponding role permission.
Every alert goes to Splunk, and the critical ones also notify the on-call engineer.
In the CI pipeline, we configured Trivy to scan every image built for a PHI namespace. Any Critical or High CVE blocks the merge. We also added Cosign image signing at build time, and a Kyverno admission policy now verifies the signature before any pod can pull that image.
Want the same team on your Kubernetes HIPAA compliance project?
How We Tested the Kubernetes HIPAA Compliance Setup Before Going Live
Once the architecture work was complete, we spent four weeks testing it against the same 23 controls the audit had identified and the PHI flow map we built in week two. Three test activities did most of the work in this phase. Here are the three key tests we ran to verify the architecture of the Kubernetes HIPAA compliance project:
1. PHI Leak Scanning Through Logs, Metrics, and Traces
We ran static analysis on every service to find log statements, metric labels, and trace span attributes that could include PHI. The scan flagged 22 paths that required fixing: 14 in log statements, 5 in Prometheus metric labels, and 3 in OpenTelemetry trace spans.
The crash reporter was the worst offender. When a service threw an unhandled exception, the crash reporter SDK collected the local variables that were live at the time of the crash and bundled them in the error message it sent to the monitoring service. Many of those variables held patient names and Medical Record Numbers (MRNs).
So, we added a redaction layer for any attribute tagged as PHI in the data model, applied at the serializer level, so a single change covered logs, traces, and crash reports.
We verified the fix by feeding synthetic PHI through every entry point and searching every log destination for it. Nothing leaked.
2. Penetration Testing the Cluster Boundary and Workload Isolation
We brought in a third-party penetration testing firm to test the cluster’s infrastructure: the control plane, the workloads, and the configurations holding them together. The test scenarios were drawn from the audit findings:
Compromise a non-PHI namespace and try to reach a PHI namespace
Extract Secrets from etcd through a stolen developer credential with kubectl access
Bypass NetworkPolicies through the service mesh
Read PHI from a Velero backup snapshot stored in S3
The test produced three findings:
an over-permissive RoleBinding in the analytics-warehouse namespace,
a missing egress rule on the imaging-svc namespace,
and a pod that ran as root because its securityContext was set in the Helm chart but overridden by a values file.
All three issues were resolved within 10 days, and the firm retested the same cluster. The retest came back clean.
3. Compliance Verification Against the Pre-Project Control Inventory
We sat with the client’s compliance officer and walked through the 23-control spreadsheet one row at a time. For each control, we showed how the architecture was actually handling it:
For encryption at rest, we disabled the KMS provider on a test pod and tried to read Secrets through the API server. The values came as ciphertext that the kube-apiserver couldn’t decrypt without the KMS key.
For MFA, we tried logging into kubectl without MFA. Cognito blocked us.
For audit logs, we looked up a recent PHI access in Splunk. The full request and response were both there, with matching rows in the application’s own audit log.
For network segmentation, we sent test traffic from non-PHI namespaces toward PHI namespaces. Calico blocked every attempt.
The compliance officer was satisfied, and the cluster moved into the next step of the HIPAA review cycle.
Need help meeting HIPAA requirements for your application?
Bacancy’s HIPAA compliance services help healthcare teams audit, fix, and verify the controls their applications need to pass a HIPAA review.
Results from the Kubernetes HIPAA Compliance Project After 90 Days
The numbers below come from the client’s own monitoring dashboards 90 days after the cluster went live with the new setup.
Metric
Before
After 90 Days
etcd encryption mode
EBS volume only
KMS application-layer + EBS
API server audit log retention
7 days
6 years (365 days queryable, full term archived)
MFA coverage on cluster admin paths
~30%
100%
NetworkPolicy coverage on PHI namespaces
0% (no policies)
100% (block-by-default)
Mean time to detect suspicious pod activity
undefined
~4 minutes
PHI services with full audit-on-read coverage
8 of 11
11 of 11
Image scans before PHI deploys
none
every CI run
Vulnerability scan frequency
ad-hoc
biannual + on every build
Penetration test frequency
None in prior 24 months
annual, scheduled
Open HIPAA findings on the cluster
11
0
One row in the table needs context: The mean time to detect suspicious pod activity was undefined before the project because there was no runtime detection on the cluster, and so there was nothing to measure. The four-minute figure after 90 days comes from controlled tests run during the testing phase: each time a tester opened a shell inside a PHI pod, Falco caught it and sent an alert to the on-call engineer in around four minutes.
Beyond these numbers, in the same 90-day window, the client passed an internal HIPAA review, closed every issue flagged in the pre-project audit, and was added to the EHR partner’s preferred integrator list, which had previously required documented HIPAA controls that the client could not produce.
Conclusion
This Kubernetes HIPAA compliance project addressed every Security Rule control the client’s cluster needed in place before the updated HIPAA Security Rule takes effect, with the compliance officer involved from day one.
For healthcare organizations running PHI workloads on EKS, GKE, or AKS, the right place to start is an audit. Go through every control in the HIPAA Security Rule, find the Kubernetes resource responsible for each one, and check whether the cluster has it in place. The remediation that follows runs as a single project, not a series of parallel tickets.
But, if you need expert guidance, Bacancy’s Kubernetes consulting services can take care of the work it takes to get your cluster ready for a HIPAA review.
Most Kubernetes HIPAA compliance projects run 10 to 16 weeks. The timeline depends on the size of the cluster, how many controls are unimplemented in the pre-project audit, the depth of PHI flow mapping needed, and whether the compliance officer is available for weekly review. Cluster size matters less than the number of services touching PHI, since each one needs its own audit logging and access control verification.
Most DevOps teams can handle the engineering side. Setting up encryption, NetworkPolicies, OPA Gatekeeper, and audit logging on Kubernetes is standard work for an experienced team. The audit is harder. It needs someone who knows the HIPAA Security Rule well enough to match each control to the right Kubernetes resource and document it for the compliance officer. Teams without HIPAA experience usually slow down here, which is when bringing in a specialist saves time.
Two key risks. Insurance is the first. Most insurance companies now ask for documented HIPAA controls at policy renewal. Healthcare clients without them either pay a higher premium or get coverage limits added. A breach during the delay is the second. If patient data is exposed before the controls are in place, OCR treats the case as a higher-tier violation, and the fines move up accordingly.
No. A Kubernetes HIPAA compliance project covers the cluster: access to the API server, encryption of Secrets in etcd, traffic between pods, and audit logs from the cluster. The application running on top of the cluster has its own HIPAA work to do. How users log into the application, what they can see once they’re logged in, how the application stores PHI in its own databases, and how it logs clinical access. All of that is application-side compliance and runs as a separate project.