Inaccessible LoadBalancer services when using OVN Octavia provider #2333

m-bull · 2024-12-16T21:32:06Z

/kind bug

What steps did you take and what happened:
Since #2128, when using OVN Octavia provider, LoadBalancer services that were accessible when NodePorts were open from 0.0.0.0/0 are no longer accessible, because Security Groups are created which restrict NodePorts to traffic sourced from only the cluster network.

What did you expect to happen:
No change in behaviour from previous versions.

Anything else you would like to add:
This isn't strictly a bug, but it is a change in behaviour that I spent some time chasing down and might be worth documenting at least here in case it saves someone else the trouble!

As mentioned here, when using the Amphora provider for Octavia the origin of traffic to LB members is from within the cluster CIDR, however the same is not true when using the OVN provider, where the origin of the traffic towards LB members is from outside of the cluster CIDR (at least in my hands). This means that creating a Security Group for workers that only allows traffic destined for NodePorts from inside the cluster CIDR instead of 0.0.0.0/0, breaks existing configurations using OVN Octavia provider underneath, possibly unexpectedly.

A small config change on the OCCM side to allow it to manage Security Groups itself makes this all work fine, but does potentially restore the exposure of NodePorts to the internet:

[LoadBalancer]
manage-security-groups=true

Environment:

Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): v0.11.3
Cluster-API version: v1.8.5
OpenStack version: Caracal
Kubernetes version (use kubectl version): 1.31.2

The text was updated successfully, but these errors were encountered:

mdbooth · 2024-12-17T09:49:30Z

This looks like an oversight in the original PR and is clearly a regression.

I suspect that the correct fix here is to add a default rule covering OVN Octavia traffic. Do you know what that rule would look like?

m-bull · 2024-12-17T11:23:41Z

From a few quick tests, it seems that the source IP of packets coming in via an OVN loadbalancer is preserved all the way to the destination, which I think means that the remote group can only realistically be 0.0.0.0/0 in order to restore the original behaviour, which unfortunately doesn't really help from the point of view of tightening up Security Group rules.

mkjpryor · 2024-12-17T17:17:06Z

@mdbooth

As @m-bull says, because of how OVN works the fix is basically to revert the change… Are we happy doing that? Because of how the networking is set up, I’m not really seeing what benefits the tighter security groups give in this case TBH, unless people are in the habit of putting floating IPs on their worker nodes? Maybe they are…

MaysaMacedo · 2024-12-17T18:54:50Z

@mkjpryor

hello, the fix you propose to revert is beneficial to when Amphora driver is used as well, right?
Perhaps, instead of reverting we could document that if users want to open any additional rules, which is the ovn-octavia case, they need to use managed security groups. Thoughts?

mkjpryor · 2024-12-18T11:09:53Z

Depends if you are happy to introduce such a severe regression in a point release, TBH. I probably wouldn’t be - personally I would revert this change, issue a new point release and then if we do still want the change it can go into v0.12.0.

P.S. I know SemVer doesn’t guarantee any backwards compatibility until 1.0.0, but in reality people are using this in prod and we can’t just do that.

mkjpryor · 2024-12-18T11:12:26Z

Also, I think the benefits it brings in the Amphora case are minimal - usually the LB and worker nodes are all on the same private network and all access to the nodes is mediated via the LBs.

I would actually argue that the thing that is different and should be documented is the exact opposite case, i.e. if all your CAPI clusters are going onto one big shared network you probably want different secgroups to shut that access down.

github-project-automation bot added this to CAPO Roadmap Dec 16, 2024

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2024

github-project-automation bot moved this to Inbox in CAPO Roadmap Dec 16, 2024

sd109 mentioned this issue Dec 17, 2024

Allow OCCM to manage LB security groups azimuth-cloud/capi-helm-charts#469

Merged

sd109 mentioned this issue Dec 18, 2024

Revert CAPO to v0.10.5 azimuth-cloud/ansible-collection-azimuth-ops#725

Merged

mkjpryor mentioned this issue Jan 8, 2025

🌱 Support for additional controlplane and worker secgroup rules #2353

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccessible LoadBalancer services when using OVN Octavia provider #2333

Inaccessible LoadBalancer services when using OVN Octavia provider #2333

m-bull commented Dec 16, 2024 •

edited

Loading

mdbooth commented Dec 17, 2024

m-bull commented Dec 17, 2024

mkjpryor commented Dec 17, 2024 •

edited

Loading

MaysaMacedo commented Dec 17, 2024 •

edited

Loading

mkjpryor commented Dec 18, 2024 •

edited

Loading

mkjpryor commented Dec 18, 2024 •

edited

Loading

Inaccessible LoadBalancer services when using OVN Octavia provider #2333

Inaccessible LoadBalancer services when using OVN Octavia provider #2333

Comments

m-bull commented Dec 16, 2024 • edited Loading

mdbooth commented Dec 17, 2024

m-bull commented Dec 17, 2024

mkjpryor commented Dec 17, 2024 • edited Loading

MaysaMacedo commented Dec 17, 2024 • edited Loading

mkjpryor commented Dec 18, 2024 • edited Loading

mkjpryor commented Dec 18, 2024 • edited Loading

m-bull commented Dec 16, 2024 •

edited

Loading

mkjpryor commented Dec 17, 2024 •

edited

Loading

MaysaMacedo commented Dec 17, 2024 •

edited

Loading

mkjpryor commented Dec 18, 2024 •

edited

Loading

mkjpryor commented Dec 18, 2024 •

edited

Loading