Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

thomaspeitz · 2025-01-09T08:36:21Z

Keeping this open while our investigation is running. We cannot explain it yet.
Will fill up with more details as soon as we have understood it deeper.
As it broke only a few environments it is harder to debug.
But it it is warning to check your log lines duriung upgrade

/tmp/nginx/nginx.pid

What happened:
Upgraded our ingress-controller via helm from

version: 4.11.3

to

version: 4.12.0

Causing a major outage on 4/10 clusters. We can not understand yet why.
Kubernetes version 1.31.x

  |   | 2025-01-09 07:45:35.583 | nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
-- | -- | -- | -- | --
  |   | 2025-01-09 07:45:35.583 | 2025/01/09 07:45:35 [error] 215#215: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | 2025/01/09 07:45:35 [error] 215#215: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | nginx: [error] open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.583 | 2025/01/09 07:45:35 [error] 215#215: open() "/tmp/nginx/nginx.pid" failed (2: No such file or directory) |  
  |   | 2025-01-09 07:45:35.000 | name=ingress-nginx-general-r6-controller-565c5966f7-8p4rq kind=Pod objectAPIversion=v1 objectRV=2931225444 eventRV=2931226571 reportingcontroller=nginx-ingress-controller sourcecomponent=nginx-ingress-controller reason=RELOAD type=Warning count=1 msg="Error reloading NGINX: exit status 1\n2025/01/09 07:45:35 [notice] 215#215: signal process started\n2025/01/09 07:45:35 [error] 215#215: open() \"/tmp/nginx/nginx.pid\" failed (2: No such file or directory)\nnginx: [error] open() \"/tmp/nginx/nginx.pid\" failed (2: No such file or directory)\n" |  
  |

What you expected to happen:
Ingress controller continues to work.

I am not sure yet. I keep it open while we investigate deeper.

Kubernetes version (use kubectl version):
v1.31.3-eks-59bf375

Environment:
AWS / EKS

Cloud provider or hardware configuration:
AWS
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
- Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
Basic cluster related info:
- kubectl version
- kubectl get nodes -o wide

Other data is going to follow after we did a breakdown

How to reproduce this issue:
Hard to reproduce as it is currently happening on the nodes which we cannot test again.

Update 10.01 - 00:10 - Tested again a deployment of the faulty version. Ssl certs were sendings as K8s Fake certs on some domains but the old version were sending the real letsencrypt certs. Looks like a TLS issue after upgrade.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2025-01-09T08:36:30Z

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

thomaspeitz added the kind/bug Categorizes issue or PR as related to a bug. label Jan 9, 2025

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 9, 2025

k8s-ci-robot added the needs-priority label Jan 9, 2025

strongjz added this to [SIG Network] Ingress NGINX Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

thomaspeitz commented Jan 9, 2025 •

edited

Loading

k8s-ci-robot commented Jan 9, 2025

Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

Downtime after upgrading to 1.12.0 - Open "/tmp/nginx/nginx.pid" failed #12645

Comments

thomaspeitz commented Jan 9, 2025 • edited Loading

k8s-ci-robot commented Jan 9, 2025

thomaspeitz commented Jan 9, 2025 •

edited

Loading