When Kubernetes Upgrades Reveal Hidden Technical Debt: Fixing kube-proxy Version Skew in EKS
Recently I ran into an interesting issue while reviewing the upgrade readiness of one of my Amazon EKS clusters.
What initially looked like a routine upgrade warning turned into a small investigation into:
Kubernetes version skew policies
EKS managed vs self-managed add-ons
Terraform cluster provisioning
Cilium networking configuration
In the end, the fix was simple — but the journey revealed something that often happens in long-running Kubernetes clusters: silent infrastructure drift.
This post walks through how I discovered the issue and how I resolved it.
The Warning
While reviewing the EKS Upgrade Insights dashboard, I saw the following alert:
kube-proxy version skew
Checks version of kube-proxy in cluster to see if upgrade would cause non compliance
with supported Kubernetes kube-proxy version skew policy.At first glance, this seemed minor. The cluster had been running perfectly fine.
But version skew warnings should always be investigated because they often indicate deeper compatibility issues.
Inspecting the kube-proxy Version
To check the running version of kube-proxy in the cluster, I inspected the daemonset:
kubectl -n kube-system get ds kube-proxy \
-o jsonpath=’{.spec.template.spec.containers[0].image}’The result:
602401143452.dkr.ecr.eu-west-2.amazonaws.com/eks/kube-proxy:v1.29.0-minimal-eksbuild.1The cluster control plane version was:
1.35This immediately explained the warning.
Kubernetes enforces a version skew policy between control plane components and kube-proxy.
The rule is:
kube-proxy must not be newer than the API server
kube-proxy may be up to three minor versions older
In my case:
ComponentVersionControl Plane1.35kube-proxy1.29
That’s a six-version gap, well outside the supported window.
Even though the cluster was functioning, this configuration was technically unsupported.
Why Was kube-proxy So Old?
The next question was obvious:
Why hadn’t kube-proxy been upgraded automatically?
To answer that, I checked whether kube-proxy was installed as an EKS managed add-on.
aws eks describe-addon \
--cluster-name my-cluster \
--addon-name kube-proxyThe result:
ResourceNotFoundException: No addon: kube-proxy found in clusterThat meant kube-proxy was not managed by EKS.
Instead, it was running as a self-managed Kubernetes daemonset.
This made sense once I remembered how the cluster had been created.
The cluster was originally provisioned via Terraform, and at the time I had not explicitly configured EKS add-ons.
When clusters are created outside the AWS console, the default components — such as:
kube-proxy
CoreDNS
AWS VPC CNI
may exist as self-managed components unless explicitly installed as managed add-ons.
A Complication: Cilium Networking
This cluster also uses Cilium as the CNI instead of the default AWS VPC CNI.
Before modifying kube-proxy, I needed to confirm whether Cilium was operating in kube-proxy replacement mode.
If it was, kube-proxy would not be required at all.
Checking the Cilium configuration:
kubectl -n kube-system get cm cilium-config -o yaml | grep kubeProxyReplacementThe command returned nothing.
I also confirmed the daemonset was actively running:
kubectl -n kube-system get ds kube-proxyResult:
kube-proxy 6 6 6This confirmed that kube-proxy was still responsible for Kubernetes Service routing.
Cilium was providing networking and policy features, but not replacing kube-proxy.
The Solution: Migrating kube-proxy to an EKS Managed Add-on
Since kube-proxy was self-managed and significantly outdated, the best solution was to migrate it to the EKS managed add-on system.
Managed add-ons provide several advantages:
automatic compatibility with cluster versions
simplified upgrades
consistent lifecycle management
integration with EKS upgrade workflows
Because the cluster itself is managed with Terraform, I chose to manage the add-on declaratively as well.
I added the following Terraform resource:
resource “aws_eks_addon” “kube_proxy” {
cluster_name = aws_eks_cluster.eks_cluster.name
addon_name = “kube-proxy”
addon_version = “v1.35.0-eksbuild.2”
resolve_conflicts_on_create = “OVERWRITE”
resolve_conflicts_on_update = “OVERWRITE”
}The OVERWRITE flag is important.
Because a self-managed daemonset already existed, this tells EKS to replace the existing configuration with the managed add-on.
Applying the Change
After adding the resource, I ran:
terraform planThe output showed:
+ aws_eks_addon.kube_proxy will be createdImportantly:
The EKS cluster itself was not recreated.
Add-ons are independent resources and can be installed or updated without affecting the control plane.
After applying the change, I verified the add-on status:
aws eks describe-addon \
--cluster-name my-cluster \
--addon-name kube-proxyResult:
ACTIVEThe kube-proxy daemonset was now running the correct version for Kubernetes 1.35.
Lessons Learned
Even small upgrade warnings can reveal interesting platform insights.
Here are a few takeaways from this experience.
Long-Running Clusters Accumulate Drift
Clusters that have been running for years often contain components installed using older practices.
Without explicit lifecycle management, these components may quietly fall out of compliance.
EKS Managed Add-ons Reduce Operational Burden
Moving critical components such as kube-proxy and CoreDNS to managed add-ons simplifies cluster maintenance and reduces upgrade risks.
Networking Choices Affect Upgrade Strategy
Using alternative CNIs like Cilium introduces additional considerations.
Before modifying cluster components, it’s important to understand which parts of the networking stack are actually responsible for service routing.
Infrastructure as Code Should Include Add-ons
If you are managing EKS with Terraform, it is worth managing add-ons explicitly as well.
This keeps the entire cluster lifecycle declarative and avoids surprises during upgrades.
Final Thoughts
Kubernetes platforms evolve quickly, and clusters that were created with one set of assumptions can drift away from best practices over time.
This issue was ultimately straightforward to resolve, but it highlights how important it is to periodically review the underlying components of your cluster — not just the workloads running on top of it.
Sometimes a simple upgrade warning is actually an opportunity to improve the overall platform architecture.
If you’re interested in platform engineering, Kubernetes infrastructure, and DevSecOps, I’ll be writing more posts about real-world problems like this as they arise.



