Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make components on control-plane nodes point to the local API server endpoint #2271

Open
2 tasks done
neolit123 opened this issue Aug 31, 2020 · 61 comments
Open
2 tasks done
Assignees
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@neolit123
Copy link
Member

neolit123 commented Aug 31, 2020

in CAPI immutable upgrades we saw a problem where a 1.19 joining node cannot bootstrap, if a 1.19 KCM takes leadership and tries to send a CSR to a 1.18 API server on an existing Node. this happens because in 1.19 the CSR API graduated to v1 and a KCM is supposed to talk to a N or N+1 API server only.

a better explanation here:
https://kubernetes.slack.com/archives/C8TSNPY4T/p1598907959059100?thread_ts=1598899864.038100&cid=C8TSNPY4T


optionally we should see if we can make the kubelet on control-plane Nodes bootstrap via the local API server instead of using the CPE. this might be a bit tricky and needs investigation. we could at least post-fix the kubelet.conf to point to the local API server after the bootstrap has finished.
see kubernetes/kubernetes#80774 for a related discussion

this change requires a more detailed plan, a feature gate and a KEP

1.31

KEP:

1.32

TODO. Move the FG to beta?

@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Aug 31, 2020
@neolit123 neolit123 added this to the v1.20 milestone Aug 31, 2020
@neolit123 neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Sep 1, 2020
@neolit123
Copy link
Member Author

neolit123 commented Sep 1, 2020

first PR is here: kubernetes/kubernetes#94398

@neolit123
Copy link
Member Author

we spoke about the kubelet.conf in the office hours today:

  • Pointing the kubelet to the local api server should work, but the kubelet-start phase has to happen after the control-plane manifests are written on disk for CP nodes.
  • Requires phase reorder and we are considering using a feature gate.
  • This avoids skew problems of a new kubelet trying to bootstrap against an old api-server.
  • One less component to point to the CPE.

i'm going to experiment and see how it goes, but this cannot be backported to older releases as it is a breaking change to phase users.

@zhangguanzhang
Copy link

This breaks the rules, the controlPlaneEndpoint maybe a domain, if this is a domain, so it will not run ok after your code

@neolit123
Copy link
Member Author

This breaks the rules, the controlPlaneEndpoint maybe a domain, if this is a domain, so it will not run ok after your code

can you clarify with examples?

@neolit123
Copy link
Member Author

@jdef added a note that that some comments were left invalid after the recent change:
https://github.com/kubernetes/kubernetes/pull/94398/files/d9441906c4155173ce1a75421d8fcd1d2f79c471#r486252360

this should be fixed in master.

@neolit123
Copy link
Member Author

some else added a comment on kubernetes/kubernetes#94398
but later deleted it:

when use method CreateJoinControlPlaneKubeConfigFiles with controlPlaneEndpoint like apiserver.cluster.local to generate config files. and use kubeadm init --config=/root/kubeadm-config.yaml --upload-certs -v 5
the error occurs like

I0910 15:15:54.436430   52511 kubeconfig.go:84] creating kubeconfig file for controller-manager.conf
currentConfig.Clusters[currentCluster].Server:  https://apiserver.cluster.local:6443 
config.Clusters[expectedCluster].Server:  https://192.168.160.243:6443
a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL

this validation should be turned into a warning instead of an error. then components would fail if they don't point to a valid API server, so the user would know.

@zhangguanzhang
Copy link

This breaks the rules, the controlPlaneEndpoint maybe a domain, if this is a domain, so it will not run ok after your code

can you clarify with examples?

you could see this doc https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/#steps-for-the-first-control-plane-node

--control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"

@neolit123
Copy link
Member Author

neolit123 commented Sep 10, 2020

i do know about that doc. are you saying that using "DNS-name:port" is completely broken now for you? what error output are you seeing?
i did test this during my work on the changes and it worked fine.

@jdef
Copy link

jdef commented Sep 10, 2020

some else added a comment on kubernetes/kubernetes#94398
but later deleted it:

when use method CreateJoinControlPlaneKubeConfigFiles with controlPlaneEndpoint like apiserver.cluster.local to generate config files. and use kubeadm init --config=/root/kubeadm-config.yaml --upload-certs -v 5
the error occurs like

I0910 15:15:54.436430   52511 kubeconfig.go:84] creating kubeconfig file for controller-manager.conf
currentConfig.Clusters[currentCluster].Server:  https://apiserver.cluster.local:6443 
config.Clusters[expectedCluster].Server:  https://192.168.160.243:6443
a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL

this validation should be turned into a warning instead of an error. then components would fail if they don't point to a valid API server, so the user would know.

yes, please. this just bit us when testing a workaround in a pre-1.19.1 cluster whereby we tried manually updating clusters[].cluster.server in (scheduler, controller-manager .conf) to point to localhost instead of the official controlplane endpoint.

@zhangguanzhang
Copy link

i do know about that doc. are you saying that using "DNS-name:port" is completely broken now for you?

yes, if you want to deploy a HA cluster, it is best to set controlPlaneEndpoint to the LOAD_BALANCER_DNS instead of LOAD_BALANCER ip

@neolit123
Copy link
Member Author

yes, if you want to deploy a HA cluster, it is best to set controlPlaneEndpoint to the LOAD_BALANCER_DNS instead of LOAD_BALANCER ip

what error are you getting?

@zhangguanzhang
Copy link

zhangguanzhang commented Sep 10, 2020

yes, if you want to deploy a HA cluster, it is best to set controlPlaneEndpoint to the LOAD_BALANCER_DNS instead of LOAD_BALANCER ip

what error are you getting?

I add some code for the log print, this is the error

I0910 13:14:53.017570   21006 kubeconfig.go:84] creating kubeconfig file for controller-manager.conf
currentConfig.Clusters https://apiserver.cluster.local:6443 
config.Clusters:  https://192.168.160.243:6443
error execution phase kubeconfig/controller-manager: a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL

@neolit123
Copy link
Member Author

neolit123 commented Sep 10, 2020

ok, so you have the same error as the user reporting above.

we can fix this for 1.19.2

one workaround is:

  • start kubeadm "init" with kubeconfig files using the local endpoint (instead of control-plane-endpoint)
  • wait for init to finish
  • modify the kubeconfig files again
  • restart the kube-scheduler and kube-controller-manager

@zhangguanzhang
Copy link

Both kube-scheduler and kube-controller-manager can use localhost and loadblance to connect to kube-apiserver, but users cannot be forced to use localhost, and warnning can be used instead of error

@fabriziopandini
Copy link
Member

@neolit123 I'm +1 to relax the checks on the address in the existing kubeconfig file.
We can either remove the check or make it more flexible by checking if the address is either CPE or LAPI

@oldthreefeng
Copy link

oldthreefeng commented Sep 11, 2020

@neolit123 here is the example. i just edit to add log print.
https://github.com/neolit123/kubernetes/blob/d9441906c4155173ce1a75421d8fcd1d2f79c471/cmd/kubeadm/app/phases/kubeconfig/kubeconfig.go#L225

fmt.Println("currentConfig.Clusters[currentCluster].Server:", currentConfig.Clusters[currentCluster].Server, "\nconfig.Clusters[expectedCluster].Server: ", config.Clusters[expectedCluster].Server)

use method CreateJoinControlPlaneKubeConfigFiles with controlPlaneEndpoint to genrate kube-schedulerand kube-controller-manager , in this situation , set controlPlaneEndpoint as LOAD_BALANCER_DNS:LOAD_BALANCER_PORT . it is best to set LOAD_BALANCER_DNS instead of IP.
then to run kubeadm init with LOAD_BALANCER_DNS:LOAD_BALANCER_PORT. the result is.

./kubeadm  init  --control-plane-endpoint  apiserver.cluster.local:6443
W0911 09:36:17.922135   63517 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.1
[preflight] Running pre-flight checks
	[WARNING FileExisting-socat]: socat not found in system path
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
currentConfig.Clusters[currentCluster].Server: https://apiserver.cluster.local:6443 
config.Clusters[expectedCluster].Server:  https://192.168.160.243:6443
error execution phase kubeconfig/controller-manager: a kubeconfig file "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong API Server URL
To see the stack trace of this error execute with --v=5 or higher

@neolit123
Copy link
Member Author

neolit123 commented Sep 14, 2020

i will send the PR in the next couple of days.
edit: kubernetes/kubernetes#94816

@neolit123
Copy link
Member Author

fix for 1.19.2 is here:
kubernetes/kubernetes#94890

@neolit123
Copy link
Member Author

neolit123 commented Sep 18, 2020

to further summarize what is happening. after the changes above, kubeadm will no longer error out if the server URL in custom provided kubeconfig files does not match the expected one. it will only show a warning.

example:

  • you have something like foo:6443 in scheduler.conf
  • kubeadm expects scheduler.conf to point to e.g. 192.168.0.108:6443 (local api server endpoint)
  • kubeadm will show you a warning when reading the provided kubeconfig file.
  • this allows you to modify the topology of your control-plane components, but you need to make sure the components work after such a customization.

@jdef
Copy link

jdef commented Sep 18, 2020

fix for 1.19.2 is here:
kubernetes/kubernetes#94890

1.19.2 is already out. So this fix will target 1.19.3, yes?

@neolit123
Copy link
Member Author

neolit123 commented Sep 18, 2020 via email

@neolit123
Copy link
Member Author

thanks for all the work on this @chrischdi
we could target 1.30 for it as it has been a long standing task, however it's still not clear to me how exactly users are affected.
it's the hairpin mode LB, correct?

we should probably talk more about it in the kubeadm office hours this week.

If this sounds good, I would be happy to help driving this forward. I don't know if this requires a KEP first?! Happy to receive some feedback :-)

given a FG was suggested and given it's a complex change, that is 1) breaking for users that anticipate a certain kubeadm phase order, and also 2) needs tests - i guess we need a KEP.

@pacoxu @SataQiu WDYT about this overall?
we need agreement on it, obviously.

my vote is +1, but i hope we don't break users in ways that cannot be recoverable.

if we agree on a KEP and a way forward you can omit the PRR (prod readiness review) as it's a non-target for kubeadm.
https://github.com/kubernetes/enhancements/blob/master/keps/README.md

@pacoxu
Copy link
Member

pacoxu commented Jan 29, 2024

During joining a new control-plane node, in the step of new EtcdLocalSubphase, is kubelet running in standalone mode at first?

It sounds like doable.

For upgrade progress, should we add logic for kubelet config to point to localhost?

@chrischdi
Copy link
Member

it's the hairpin mode LB, correct?

I think I lack context on what "hairpin mode LB" is :-)

During joining a new control-plane node, in the step of new EtcdLocalSubphase, is kubelet running in standalone mode at first?

Yes, in the targeted implementation, kubelet starts already, but cannot yet join the cluster (because the referenced kube-apiserver will not get healthy unless etcd is started and joined the cluster).
During EtcdLocalSubphase we then place the etcd static pod manifest and join etcd to the cluster.
After it joined, kube-apiserver gets healthy and the kubelet bootstraps itself, while kubeadm starts to wait for bootstrap to complete.

@neolit123
Copy link
Member Author

neolit123 commented Jan 29, 2024

it's the hairpin mode LB, correct?

I think I lack context on what "hairpin mode LB" is :-)

i think the CAPZ and the Azure LB were affected:
https://github.com/microsoft/Azure-ILB-hairpin

if we agree that this needs a KEP it can cover what problems we are trying to solve.
it's a disruptive change, thus it needs to be waranted.

@randomvariable
Copy link
Member

Yup, Azure is the most affected here, where traffic outbound to a LB that points back to a host making the request will have the traffic dropped (looks like a hairpin).

@neolit123
Copy link
Member Author

we spoke with @chrischdi about his proposal at the kubeadm meeting today (Wed 31st January 2024 - 9am PT)
there are some notes and also a recording.

@SataQiu @pacoxu i proposed that we should have a new feature gate for this. also a KEP, so that we can decide on some of the implementation details. please, LMK if you think a KEP is not needed, or if you have other comments at this stage.

@pacoxu
Copy link
Member

pacoxu commented Feb 1, 2024

A KEP would help to track it and FG is needed.

  • Should this KEP be tracked by release team? I think not.

@neolit123
Copy link
Member Author

neolit123 commented Feb 1, 2024

A KEP would help to track it and FG is needed.

  • Should this KEP be tracked by release team? I think not.

we haven't been tracking kubeadm KEPs with release team for a number of releases.
they are more useful for SIGs with many KEPs like node,api,auth. PRR does not apply to kubeadm.
the overall process is also optional for us.

https://github.com/kubernetes/sig-release/tree/master/releases/release-1.30
enhancement freeze we can respect but it's not 100% mandatory.
code freeze we must respect IMO, because if we break something it creates noise for the whole k/k in unit tests. also e2e tests for the release team.

@neolit123 neolit123 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Feb 8, 2024
@neolit123 neolit123 modified the milestones: v1.30, v1.31 Apr 5, 2024
@neolit123 neolit123 removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jun 27, 2024
@neolit123 neolit123 modified the milestones: v1.31, v1.32 Aug 7, 2024
@pacoxu
Copy link
Member

pacoxu commented Aug 26, 2024

@chrischdi do you plan to promote this to beta in v1.32?

@chrischdi
Copy link
Member

If we agree on that, I'm happy to.

We're already started using this feature gate in CAPI for v1.31 clusters because otherwise upgrades to v1.31 are broken due to violating the kubernetes version skew!

@neolit123
Copy link
Member Author

neolit123 commented Aug 27, 2024

+1
we need to update docs, kep, e2e, k/k code for the beta.

@chrischdi
Copy link
Member

@neolit123 : if I am right: code-freeze would be at 8th November 2024.

For graduating this to Beta the following was noted on the KEP as TODO:

  • Make feature gate to be enabled by default.
  • Gather feedback from developers and surveys.
  • Make unit and e2e test changes.
  • Update the feature gate documentation.
  • Document the new phases.

Due to upcoming PTO, I'm not sure if I am able to make this work for code freeze so to be realistic we may need to move this to 1.33 instead. WDYT?

@chrischdi
Copy link
Member

Note: the feature itself already showed its value in CAPI and helps to not violate the version skew policy there!

@neolit123
Copy link
Member Author

i will move it to 1.33 @chrischdi
we should try to make these changes early in the next cycle.

Gather feedback from developers and surveys

this part is covered by CAPI since it enabled it by default, i guess.

@neolit123 neolit123 modified the milestones: v1.32, v1.33 Oct 22, 2024
@chrischdi
Copy link
Member

i will move it to 1.33 @chrischdi we should try to make these changes early in the next cycle.

Gather feedback from developers and surveys

this part is covered by CAPI since it enabled it by default, i guess.

I'd say yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests