Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: let Windows CNS use the InClusterConfig #3248

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rbtr
Copy link
Contributor

@rbtr rbtr commented Dec 6, 2024

Reason for Change:
As of AKS 1.30, service account tokens refresh every ~1 hour when OIDC is enabled. They were previously valid for a year.
This setkubeconfigpath.ps1 script was at some point necessary on Windows to create a valid kubeconfig for CNS (and NPM and others), but based on my testing we don't need it today. It copies the token from the token file to create a custom kubeconfig from a template at startup and then we pass CNS that file via --kubeconfig.

The script runs at startup and never re-runs, so the token that exists at Pod start is the token CNS will try to use forever. Yearly token lifespans were long enough that no CNS Pod was ever up long enough to hit token expiration.

This becomes an issue with hourly token lifespans. An hour after Pod start, the token becomes invalid and CNS can no longer auth to the API server. For PodSubnet clusters, this permanently prevents CNS from being able to scale the IPAM pool and provide more Pod IPs.

Issue Fixed:

Azure/AKS#4679

Requirements:

Notes:

@rbtr rbtr requested review from a team as code owners December 6, 2024 19:13
@rbtr rbtr self-assigned this Dec 6, 2024
@rbtr rbtr added cns Related to CNS. fix Fixes something. labels Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 3 changed files in this pull request and generated no suggestions.

Files not reviewed (1)
  • cns/Dockerfile: Language not supported
Copy link
Contributor

@thatmattlong thatmattlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no powershell.exe needed I think

@@ -32,8 +32,6 @@ EXPOSE 10090

# skopeo inspect docker://mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image:v1.0.0 --format "{{.Name}}@{{.Digest}}"
FROM mcr.microsoft.com/oss/kubernetes/windows-host-process-containers-base-image@sha256:b4c9637e032f667c52d1eccfa31ad8c63f1b035e8639f3f48a510536bf34032b as windows
COPY --from=builder /azure-container-networking/cns/kubeconfigtemplate.yaml kubeconfigtemplate.yaml
COPY --from=builder /azure-container-networking/npm/examples/windows/setkubeconfigpath.ps1 setkubeconfigpath.ps1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to leave these around in the dockerfile for a bit in case the release of cns which has this removed gets ahead of the overlaymgr release that will remove them from the manifest (or overlaymgr rolls back)

Copy link
Contributor Author

@rbtr rbtr Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we won't be releasing CNS urgently, the production impact can be mitigated entirely through a daemonset change in AKS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to keep cns/kubeconfigtemplate.yaml and npm/examples/windows/setkubeconfigpath.ps1 in the repo any longer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely no but they are referenced in the NPM build, and we need to make sure nothing we're building out of this branch can land on k8s <1.28

@rbtr
Copy link
Contributor Author

rbtr commented Dec 9, 2024

/azp run Azure Container Networking PR

@rbtr rbtr requested a review from thatmattlong December 9, 2024 23:01
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

github-merge-queue bot pushed a commit to microsoft/retina that referenced this pull request Dec 12, 2024
# Description

As of [AKS 1.30](https://github.com/Azure/AKS/releases/tag/2024-06-09),
service account tokens refresh every ~1 hour when OIDC is enabled. They
were previously valid for a year.
[This setkubeconfigpath.ps1
script](https://github.com/Azure/azure-container-networking/blob/47b243c42fd16119a96ab6d06eb602ac2ce40e7d/npm/examples/windows/setkubeconfigpath.ps1)
was at some point necessary on Windows to create a valid kubeconfig for
Retina WIndows. It copies the token from the token file to create a
custom kubeconfig from a template at startup and then we pass Retina
Windows that file via --kubeconfig.

The script runs at startup and never re-runs, so the token that exists
at Pod start is the token CNS will try to use forever. Yearly token
lifespans were long enough that no Retina Windows Pod was ever up long
enough to hit token expiration.

This becomes an issue with hourly token lifespans. An hour after Pod
start, the token becomes invalid and Retina Windows can no longer auth
to the API server. For PodSubnet clusters, this permanently prevents
Retina Windows from being able to scale the IPAM pool and provide more
Pod IPs.

Fix for CNS which referenced:
Azure/azure-container-networking#3248

## Related Issue

If this pull request is related to any issue, please mention it here.
Additionally, make sure that the issue is assigned to you before
submitting this pull request.

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [ ] I have updated the documentation, if necessary.
- [ ] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
github-merge-queue bot pushed a commit to microsoft/retina that referenced this pull request Dec 12, 2024
# Description

As of [AKS 1.30](https://github.com/Azure/AKS/releases/tag/2024-06-09),
service account tokens refresh every ~1 hour when OIDC is enabled. They
were previously valid for a year.
[This setkubeconfigpath.ps1
script](https://github.com/Azure/azure-container-networking/blob/47b243c42fd16119a96ab6d06eb602ac2ce40e7d/npm/examples/windows/setkubeconfigpath.ps1)
was at some point necessary on Windows to create a valid kubeconfig for
Retina WIndows. It copies the token from the token file to create a
custom kubeconfig from a template at startup and then we pass Retina
Windows that file via --kubeconfig.

The script runs at startup and never re-runs, so the token that exists
at Pod start is the token CNS will try to use forever. Yearly token
lifespans were long enough that no Retina Windows Pod was ever up long
enough to hit token expiration.

This becomes an issue with hourly token lifespans. An hour after Pod
start, the token becomes invalid and Retina Windows can no longer auth
to the API server. For PodSubnet clusters, this permanently prevents
Retina Windows from being able to scale the IPAM pool and provide more
Pod IPs.

Fix for CNS which referenced:
Azure/azure-container-networking#3248

## Related Issue

If this pull request is related to any issue, please mention it here.
Additionally, make sure that the issue is assigned to you before
submitting this pull request.

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [ ] I have updated the documentation, if necessary.
- [ ] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
@rbtr rbtr force-pushed the fix/windows-tokens branch from 765c550 to c5b0288 Compare December 16, 2024 22:24
@rbtr
Copy link
Contributor Author

rbtr commented Dec 16, 2024

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cns Related to CNS. fix Fixes something.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants