make components on control-plane nodes point to the local API server endpoint #2271

neolit123 · 2020-08-31T21:23:01Z

in CAPI immutable upgrades we saw a problem where a 1.19 joining node cannot bootstrap, if a 1.19 KCM takes leadership and tries to send a CSR to a 1.18 API server on an existing Node. this happens because in 1.19 the CSR API graduated to v1 and a KCM is supposed to talk to a N or N+1 API server only.

a better explanation here:
https://kubernetes.slack.com/archives/C8TSNPY4T/p1598907959059100?thread_ts=1598899864.038100&cid=C8TSNPY4T

we should make the controller-manager.conf and scheduler.conf that kubeadm generates talk to the local API server and not to the controlPlaneEndpoint (CPE, e.g. LB).
PR for 1.20: kubeadm: make the scheduler and KCM connect to the local API endpoint kubernetes#94398
PR for 1.19: Automated cherry pick of #94398: kubeadm: make the scheduler and KCM connect to local endpoint kubernetes#94442
relax the server URL validation in kubeconfig files:
make components on control-plane nodes point to the local API server endpoint #2271 (comment)
kubeadm: relax the validation of kubeconfig server URLs kubernetes#94816

optionally we should see if we can make the kubelet on control-plane Nodes bootstrap via the local API server instead of using the CPE. this might be a bit tricky and needs investigation. we could at least post-fix the kubelet.conf to point to the local API server after the bootstrap has finished.
see kubernetes/kubernetes#80774 for a related discussion

this change requires a more detailed plan, a feature gate and a KEP

1.31

KEP:

1.32

TODO. Move the FG to beta?

neolit123 · 2020-09-01T15:06:51Z

first PR is here: kubernetes/kubernetes#94398

neolit123 · 2020-09-02T17:30:30Z

we spoke about the kubelet.conf in the office hours today:

Pointing the kubelet to the local api server should work, but the kubelet-start phase has to happen after the control-plane manifests are written on disk for CP nodes.
Requires phase reorder and we are considering using a feature gate.
This avoids skew problems of a new kubelet trying to bootstrap against an old api-server.
One less component to point to the CPE.

i'm going to experiment and see how it goes, but this cannot be backported to older releases as it is a breaking change to phase users.

zhangguanzhang · 2020-09-10T07:46:08Z

This breaks the rules, the controlPlaneEndpoint maybe a domain, if this is a domain, so it will not run ok after your code

neolit123 · 2020-09-10T13:03:46Z

This breaks the rules, the controlPlaneEndpoint maybe a domain, if this is a domain, so it will not run ok after your code

can you clarify with examples?

neolit123 · 2020-09-10T13:04:54Z

@jdef added a note that that some comments were left invalid after the recent change:
https://github.com/kubernetes/kubernetes/pull/94398/files/d9441906c4155173ce1a75421d8fcd1d2f79c471#r486252360

this should be fixed in master.

neolit123 · 2020-09-10T13:09:54Z

some else added a comment on kubernetes/kubernetes#94398
but later deleted it:

when use method CreateJoinControlPlaneKubeConfigFiles with controlPlaneEndpoint like apiserver.cluster.local to generate config files. and use kubeadm init --config=/root/kubeadm-config.yaml --upload-certs -v 5
the error occurs like

I0910 15:15:54.436430   52511 kubeconfig.go:84] creating kubeconfig file for controller-manager.conf
currentConfig.Clusters[currentCluster].Server:  https://apiserver.cluster.local:6443 
config.Clusters[expectedCluster].Server:  https://192.168.160.243:6443
a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL

this validation should be turned into a warning instead of an error. then components would fail if they don't point to a valid API server, so the user would know.

zhangguanzhang · 2020-09-10T13:16:35Z

This breaks the rules, the controlPlaneEndpoint maybe a domain, if this is a domain, so it will not run ok after your code

can you clarify with examples?

you could see this doc https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/#steps-for-the-first-control-plane-node

--control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"

neolit123 · 2020-09-10T13:21:35Z

i do know about that doc. are you saying that using "DNS-name:port" is completely broken now for you? what error output are you seeing?
i did test this during my work on the changes and it worked fine.

jdef · 2020-09-10T13:26:12Z

some else added a comment on kubernetes/kubernetes#94398
but later deleted it:

when use method CreateJoinControlPlaneKubeConfigFiles with controlPlaneEndpoint like apiserver.cluster.local to generate config files. and use kubeadm init --config=/root/kubeadm-config.yaml --upload-certs -v 5
the error occurs like

I0910 15:15:54.436430   52511 kubeconfig.go:84] creating kubeconfig file for controller-manager.conf
currentConfig.Clusters[currentCluster].Server:  https://apiserver.cluster.local:6443 
config.Clusters[expectedCluster].Server:  https://192.168.160.243:6443
a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL

this validation should be turned into a warning instead of an error. then components would fail if they don't point to a valid API server, so the user would know.

yes, please. this just bit us when testing a workaround in a pre-1.19.1 cluster whereby we tried manually updating clusters[].cluster.server in (scheduler, controller-manager .conf) to point to localhost instead of the official controlplane endpoint.

zhangguanzhang · 2020-09-10T13:27:59Z

i do know about that doc. are you saying that using "DNS-name:port" is completely broken now for you?

yes, if you want to deploy a HA cluster, it is best to set controlPlaneEndpoint to the LOAD_BALANCER_DNS instead of LOAD_BALANCER ip

neolit123 · 2020-09-10T13:29:35Z

yes, if you want to deploy a HA cluster, it is best to set controlPlaneEndpoint to the LOAD_BALANCER_DNS instead of LOAD_BALANCER ip

what error are you getting?

zhangguanzhang · 2020-09-10T13:37:44Z

yes, if you want to deploy a HA cluster, it is best to set controlPlaneEndpoint to the LOAD_BALANCER_DNS instead of LOAD_BALANCER ip

what error are you getting?

I add some code for the log print, this is the error

I0910 13:14:53.017570   21006 kubeconfig.go:84] creating kubeconfig file for controller-manager.conf
currentConfig.Clusters https://apiserver.cluster.local:6443 
config.Clusters:  https://192.168.160.243:6443
error execution phase kubeconfig/controller-manager: a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL

neolit123 · 2020-09-10T13:39:48Z

ok, so you have the same error as the user reporting above.

we can fix this for 1.19.2

one workaround is:

start kubeadm "init" with kubeconfig files using the local endpoint (instead of control-plane-endpoint)
wait for init to finish
modify the kubeconfig files again
restart the kube-scheduler and kube-controller-manager

zhangguanzhang · 2020-09-10T13:59:54Z

Both kube-scheduler and kube-controller-manager can use localhost and loadblance to connect to kube-apiserver, but users cannot be forced to use localhost, and warnning can be used instead of error

fabriziopandini · 2020-09-10T14:43:06Z

@neolit123 I'm +1 to relax the checks on the address in the existing kubeconfig file.
We can either remove the check or make it more flexible by checking if the address is either CPE or LAPI

oldthreefeng · 2020-09-11T01:41:00Z

@neolit123 here is the example. i just edit to add log print.
https://github.com/neolit123/kubernetes/blob/d9441906c4155173ce1a75421d8fcd1d2f79c471/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L225

fmt.Println("currentConfig.Clusters[currentCluster].Server:", currentConfig.Clusters[currentCluster].Server, "\nconfig.Clusters[expectedCluster].Server: ", config.Clusters[expectedCluster].Server)

use method CreateJoinControlPlaneKubeConfigFiles with controlPlaneEndpoint to genrate kube-schedulerand kube-controller-manager , in this situation , set controlPlaneEndpoint as LOAD_BALANCER_DNS:LOAD_BALANCER_PORT . it is best to set LOAD_BALANCER_DNS instead of IP.
then to run kubeadm init with LOAD_BALANCER_DNS:LOAD_BALANCER_PORT. the result is.

./kubeadm  init  --control-plane-endpoint  apiserver.cluster.local:6443
W0911 09:36:17.922135   63517 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.1
[preflight] Running pre-flight checks
	[WARNING FileExisting-socat]: socat not found in system path
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
currentConfig.Clusters[currentCluster].Server: https://apiserver.cluster.local:6443 
config.Clusters[expectedCluster].Server:  https://192.168.160.243:6443
error execution phase kubeconfig/controller-manager: a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL
To see the stack trace of this error execute with --v=5 or higher

neolit123 · 2020-09-14T14:35:42Z

i will send the PR in the next couple of days.
edit: kubernetes/kubernetes#94816

neolit123 · 2020-09-18T12:57:36Z

fix for 1.19.2 is here:
kubernetes/kubernetes#94890

neolit123 · 2020-09-18T13:01:57Z

to further summarize what is happening. after the changes above, kubeadm will no longer error out if the server URL in custom provided kubeconfig files does not match the expected one. it will only show a warning.

example:

you have something like foo:6443 in scheduler.conf
kubeadm expects scheduler.conf to point to e.g. 192.168.0.108:6443 (local api server endpoint)
kubeadm will show you a warning when reading the provided kubeconfig file.
this allows you to modify the topology of your control-plane components, but you need to make sure the components work after such a customization.

jdef · 2020-09-18T14:36:16Z

fix for 1.19.2 is here:
kubernetes/kubernetes#94890

1.19.2 is already out. So this fix will target 1.19.3, yes?

neolit123 · 2020-09-18T16:17:28Z

Indeed, they pushed it out 2 days ago. Should be out with 1.19.3 then.

neolit123 · 2024-01-29T08:16:48Z

thanks for all the work on this @chrischdi
we could target 1.30 for it as it has been a long standing task, however it's still not clear to me how exactly users are affected.
it's the hairpin mode LB, correct?

we should probably talk more about it in the kubeadm office hours this week.

If this sounds good, I would be happy to help driving this forward. I don't know if this requires a KEP first?! Happy to receive some feedback :-)

given a FG was suggested and given it's a complex change, that is 1) breaking for users that anticipate a certain kubeadm phase order, and also 2) needs tests - i guess we need a KEP.

@pacoxu @SataQiu WDYT about this overall?
we need agreement on it, obviously.

my vote is +1, but i hope we don't break users in ways that cannot be recoverable.

if we agree on a KEP and a way forward you can omit the PRR (prod readiness review) as it's a non-target for kubeadm.
https://github.com/kubernetes/enhancements/blob/master/keps/README.md

pacoxu · 2024-01-29T08:30:37Z

During joining a new control-plane node, in the step of new EtcdLocalSubphase, is kubelet running in standalone mode at first?

It sounds like doable.

For upgrade progress, should we add logic for kubelet config to point to localhost?

chrischdi · 2024-01-29T09:43:23Z

it's the hairpin mode LB, correct?

I think I lack context on what "hairpin mode LB" is :-)

During joining a new control-plane node, in the step of new EtcdLocalSubphase, is kubelet running in standalone mode at first?

Yes, in the targeted implementation, kubelet starts already, but cannot yet join the cluster (because the referenced kube-apiserver will not get healthy unless etcd is started and joined the cluster).
During EtcdLocalSubphase we then place the etcd static pod manifest and join etcd to the cluster.
After it joined, kube-apiserver gets healthy and the kubelet bootstraps itself, while kubeadm starts to wait for bootstrap to complete.

neolit123 · 2024-01-29T09:55:48Z

it's the hairpin mode LB, correct?

I think I lack context on what "hairpin mode LB" is :-)

i think the CAPZ and the Azure LB were affected:
https://github.com/microsoft/Azure-ILB-hairpin

if we agree that this needs a KEP it can cover what problems we are trying to solve.
it's a disruptive change, thus it needs to be waranted.

randomvariable · 2024-01-31T16:58:53Z

Yup, Azure is the most affected here, where traffic outbound to a LB that points back to a host making the request will have the traffic dropped (looks like a hairpin).

neolit123 · 2024-01-31T17:55:27Z

we spoke with @chrischdi about his proposal at the kubeadm meeting today (Wed 31st January 2024 - 9am PT)
there are some notes and also a recording.

@SataQiu @pacoxu i proposed that we should have a new feature gate for this. also a KEP, so that we can decide on some of the implementation details. please, LMK if you think a KEP is not needed, or if you have other comments at this stage.

pacoxu · 2024-02-01T04:59:38Z

A KEP would help to track it and FG is needed.

Should this KEP be tracked by release team? I think not.

neolit123 · 2024-02-01T06:31:58Z

A KEP would help to track it and FG is needed.

Should this KEP be tracked by release team? I think not.

we haven't been tracking kubeadm KEPs with release team for a number of releases.
they are more useful for SIGs with many KEPs like node,api,auth. PRR does not apply to kubeadm.
the overall process is also optional for us.

https://github.com/kubernetes/sig-release/tree/master/releases/release-1.30
enhancement freeze we can respect but it's not 100% mandatory.
code freeze we must respect IMO, because if we break something it creates noise for the whole k/k in unit tests. also e2e tests for the release team.

pacoxu · 2024-08-26T10:08:19Z

@chrischdi do you plan to promote this to beta in v1.32?

chrischdi · 2024-08-26T17:06:18Z

If we agree on that, I'm happy to.

We're already started using this feature gate in CAPI for v1.31 clusters because otherwise upgrades to v1.31 are broken due to violating the kubernetes version skew!

neolit123 · 2024-08-27T07:23:01Z

+1
we need to update docs, kep, e2e, k/k code for the beta.

chrischdi · 2024-10-22T07:28:40Z

@neolit123 : if I am right: code-freeze would be at 8th November 2024.

For graduating this to Beta the following was noted on the KEP as TODO:

Make feature gate to be enabled by default.
Gather feedback from developers and surveys.
Make unit and e2e test changes.
Update the feature gate documentation.
Document the new phases.

Due to upcoming PTO, I'm not sure if I am able to make this work for code freeze so to be realistic we may need to move this to 1.33 instead. WDYT?

chrischdi · 2024-10-22T07:29:45Z

Note: the feature itself already showed its value in CAPI and helps to not violate the version skew policy there!

neolit123 · 2024-10-22T08:08:47Z

i will move it to 1.33 @chrischdi
we should try to make these changes early in the next cycle.

Gather feedback from developers and surveys

this part is covered by CAPI since it enabled it by default, i guess.

chrischdi · 2024-10-22T10:42:40Z

i will move it to 1.33 @chrischdi we should try to make these changes early in the next cycle.

Gather feedback from developers and surveys

this part is covered by CAPI since it enabled it by default, i guess.

I'd say yes.

k8s-ci-robot assigned neolit123 Aug 31, 2020

neolit123 added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Aug 31, 2020

neolit123 added this to the v1.20 milestone Aug 31, 2020

vincepri mentioned this issue Aug 31, 2020

Kubernetes upgrades to v1.19 are flaky kubernetes-sigs/cluster-api#3564

Closed

neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Sep 1, 2020

neolit123 mentioned this issue Sep 1, 2020

kubeadm: make the scheduler and KCM connect to the local API endpoint kubernetes/kubernetes#94398

Merged

neolit123 mentioned this issue Sep 15, 2020

kubeadm: relax the validation of kubeconfig server URLs kubernetes/kubernetes#94816

Merged

chrischdi mentioned this issue Feb 1, 2024

kubeadm: make a control-plane's kubelet talk to the local API Server on kubeadm join. kubernetes/enhancements#4471

Open

5 tasks

neolit123 assigned chrischdi Feb 8, 2024

neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 8, 2024

neolit123 modified the milestones: v1.30, v1.31 Apr 5, 2024

chrischdi mentioned this issue Jun 19, 2024

kubeadm: implement ControlPlaneKubeletLocalMode kubernetes/kubernetes#125582

Merged

neolit123 removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jun 27, 2024

This was referenced Jun 27, 2024

kinder: add test workflow for testing ControlPlaneKubeletLocalMode feature gate #3080

Merged

kubeadm: add test workflow for testing ControlPlaneKubeletLocalMode feature gate kubernetes/test-infra#32858

Merged

neolit123 modified the milestones: v1.31, v1.32 Aug 7, 2024

sbueringer mentioned this issue Sep 25, 2024

KEP-4212: Declarative Node Maintenance kubernetes/enhancements#4213

Open

6 tasks

neolit123 modified the milestones: v1.32, v1.33 Oct 22, 2024

make components on control-plane nodes point to the local API server endpoint #2271

make components on control-plane nodes point to the local API server endpoint #2271

Comments

neolit123 commented Aug 31, 2020 • edited Loading

neolit123 commented Sep 1, 2020 • edited Loading

neolit123 commented Sep 2, 2020

zhangguanzhang commented Sep 10, 2020

neolit123 commented Sep 10, 2020

neolit123 commented Sep 10, 2020

neolit123 commented Sep 10, 2020

zhangguanzhang commented Sep 10, 2020

neolit123 commented Sep 10, 2020 • edited Loading

jdef commented Sep 10, 2020

zhangguanzhang commented Sep 10, 2020

neolit123 commented Sep 10, 2020

zhangguanzhang commented Sep 10, 2020 • edited Loading

neolit123 commented Sep 10, 2020 • edited Loading

zhangguanzhang commented Sep 10, 2020

fabriziopandini commented Sep 10, 2020

oldthreefeng commented Sep 11, 2020 • edited Loading

neolit123 commented Sep 14, 2020 • edited Loading

neolit123 commented Sep 18, 2020

neolit123 commented Sep 18, 2020 • edited Loading

jdef commented Sep 18, 2020

neolit123 commented Sep 18, 2020 via email

neolit123 commented Jan 29, 2024

pacoxu commented Jan 29, 2024 • edited Loading

chrischdi commented Jan 29, 2024

neolit123 commented Jan 29, 2024 • edited Loading

randomvariable commented Jan 31, 2024

neolit123 commented Jan 31, 2024

pacoxu commented Feb 1, 2024

neolit123 commented Feb 1, 2024 • edited Loading

pacoxu commented Aug 26, 2024

chrischdi commented Aug 26, 2024

neolit123 commented Aug 27, 2024 • edited Loading

chrischdi commented Oct 22, 2024

chrischdi commented Oct 22, 2024

neolit123 commented Oct 22, 2024

chrischdi commented Oct 22, 2024

neolit123 commented Aug 31, 2020 •

edited

Loading

neolit123 commented Sep 1, 2020 •

edited

Loading

neolit123 commented Sep 10, 2020 •

edited

Loading

zhangguanzhang commented Sep 10, 2020 •

edited

Loading

neolit123 commented Sep 10, 2020 •

edited

Loading

oldthreefeng commented Sep 11, 2020 •

edited

Loading

neolit123 commented Sep 14, 2020 •

edited

Loading

neolit123 commented Sep 18, 2020 •

edited

Loading

pacoxu commented Jan 29, 2024 •

edited

Loading

neolit123 commented Jan 29, 2024 •

edited

Loading

neolit123 commented Feb 1, 2024 •

edited

Loading

neolit123 commented Aug 27, 2024 •

edited

Loading