-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic deletion of all providers #562
Comments
This issue is currently awaiting triage. If CAPI Operator contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
We see something which could be related to that. The CRD We have no clue why this happened. @dtzar were you able to solve that? |
In our case argoCD was OutOfSync because the ca-bundle was injected. |
We install certmanager separately and have a sleep hook which checks to make sure it is available before the install. Then we've tried various ways (raw manifest YAML and helm-chart) to do the install to no avail. The thing which is most painful is the same exact configuration one time will work and another time it will fail. There is some type of race-condition happening which puts things into this state. Lost many painful hours to this problem at this point and still not sure of what the cause is. I haven't been able to reproduce it without ArgoCD, but ArgoCD should NOT do anything as far as I can see with automatically deleting anything. It should only keep trying to apply the YAML there until it matches what it sees in git. |
It seems that for some reason the namespace are getting deleted, but I can't understand from where. From ArgoCD UI:
From the cluster itself:
From the capi-operator-system log:
|
When ArgoCD deploys the raw YAML from this:
It works (doesn't auto-delete) with one minor error message (but that capi-operator-system namespace DOES exist and has a pod there): However, as soon as I add in the infrastructure provider (just the diff in YAML from this command), the behavior comes back.
|
I updated the RAW YAML output to 0.12.0 release. This was generated from Good news - it doesn't automatically terminate ALL of the namespaces, now it only terminates the azure-infrastructure-system namespace. Bad news - the only pods persistently running is the I am attaching the full raw capi-operator log for reference. You can reproduce this behavior 100% of the time - see instructions in the linked issue. |
Just tested this again with the latest CAPI-Operator release and latest version of ArgoCD. The pre-sync hook still stalls indefinitely, so I must terminate the sync to make the cluster API operator deployment move forward. I'm not sure if this is what is making all the namespaces get deleted or not (because they don't even exist yet when I terminate the sync job). But doing K8s API server audit logs I can confirm the deletion is from ArgoCD. Open to ideas on how to make ArgoCD NOT delete things and have the install work as well as not have the indefinite sleep hook problem See linked issue on ArgoCD.
|
Removed the sleep sync hook, tweaked cert-manager to be before all the other apps, and added the CAPI-Operator unfortunately fails due to it not having the proper certs. I did see that the 60 second delay took effect AND that CAPI-Operator didn't even start to try and install until cert-manager pods said they were ready. When I manually applied the cert-manager related YAML from the CAPI-operator chart, deleted the capi-operator-controller pod, capi-operator started successfully and then ArgoCD did its job (using sync-waves) to try to create the namespaces next. Namespaces get created, then you can see ArgoCD trying to create some additional things and then BAM namespaces terminated. I setup ArgoCD debug logging before the deletion and captured the logs through the deletion process. Unfortunately, there is not an obvious culprit I can see anyways. Attached logs if you're curious. |
Have you tried PrunePropagationPolicy "background"? |
Yes, this is because the providers are using hooks and so by default you get the hook deletion policies Doc: |
What steps did you take and what happened:
Install helm chart using ArgoCD and it starts to do the install and then for some unknown reason the capi-operator-system deletes everything.
What did you expect to happen:
capi-operator-system doesn't initiate deletion
Environment:
kubectl version
): 1.30/etc/os-release
): WSL/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-operator/labels?q=area for the list of labels]
The text was updated successfully, but these errors were encountered: