Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-priority
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Keeping this open while our investigation is running. We cannot explain it yet.
Will fill up with more details as soon as we have understood it deeper.
As it broke only a few environments it is harder to debug.
But it it is warning to check your log lines duriung upgrade
What happened:
Upgraded our ingress-controller via helm from
to
Causing a major outage on 4/10 clusters. We can not understand yet why.
Kubernetes version 1.31.x
What you expected to happen:
Ingress controller continues to work.
I am not sure yet. I keep it open while we investigate deeper.
Kubernetes version (use
kubectl version
):v1.31.3-eks-59bf375
Environment:
AWS / EKS
AWS
uname -a
):Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
kubectl version
kubectl get nodes -o wide
Other data is going to follow after we did a breakdown
How to reproduce this issue:
Hard to reproduce as it is currently happening on the nodes which we cannot test again.
Update 10.01 - 00:10 - Tested again a deployment of the faulty version. Ssl certs were sendings as K8s Fake certs on some domains but the old version were sending the real letsencrypt certs. Looks like a TLS issue after upgrade.
The text was updated successfully, but these errors were encountered: