Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy randomly closes during connection #2901

Open
DecathectZero opened this issue Nov 6, 2024 · 8 comments
Open

Proxy randomly closes during connection #2901

DecathectZero opened this issue Nov 6, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@DecathectZero
Copy link

DecathectZero commented Nov 6, 2024

Bug Description

After a properly established connection, sometimes I notice the following error pops out of nowhere:

Proxy error, connectivity issue or a bug.
                    Please report it to us on https://github.com/metalbear-co/mirrord/issues/new?assignees=&labels=bug&projects=&template=bug_report.yml
                    You can find the `mirrord-intproxy` logs in /tmp/mirrord.log.
                    connection closed

/tmp/mirrod.logs:

  2024-11-06T18:35:27.624023Z  INFO mirrord_intproxy::layer_initializer: new session, process_info: ProcessInfo { pid: 73153, name: "node", cmdline: ["/Users/tyler/.volta/tools/image/node/20.11.1/bin/node", "--enable-source-maps", "--inspect", "/Users/tyler/voiceflow/platform/apps/runtime/build/main"], loaded: true }
    at mirrord/intproxy/src/layer_initializer.rs:63

  2024-11-06T18:42:14.387598Z  WARN mirrord_kube::api::kubernetes::portforwarder: finished retry strategy, closing connection, connect_info: AgentKubernetesConnectInfo { pod_name: "mirrord-agent-wk5eyjv6hc-vllzt", agent_port: 44128, namespace: Some("dev-tyler"), agent_version: Some("3.122.1") }
    at mirrord/kube/src/api/kubernetes/portforwarder.rs:172

  2024-11-06T18:42:14.395494Z ERROR mirrord_intproxy::agent_conn: failed to receive message from the agent, inner task down
    at mirrord/intproxy/src/agent_conn.rs:226

  2024-11-06T18:42:14.396023Z ERROR mirrord_intproxy: task AGENT_CONNECTION failed: agent unexpectedly closed connection
    at mirrord/intproxy/src/lib.rs:243

  2024-11-06T18:42:14.396986Z ERROR mirrord::internal_proxy: Internal proxy encountered an error, exiting, error: Main internal proxy logic failed: agent unexpectedly closed connection
    at mirrord/cli/src/internal_proxy.rs:146

Steps to Reproduce

N/A hard to reproduce but happens randomly

Backtrace

No response

mirrord layer logs

No response

mirrord intproxy logs

No response

mirrord agent logs

No response

mirrord config

No response

mirrord CLI version

mirrord 3.122.1

mirrord-agent version

ghcr.io/metalbear-co/mirrord:3.122.1

mirrord-operator version (if relevant)

No response

plugin kind and version (if relevant)

No response

Your operating system and version

OSx Sonoma 14.4.1

Local process

nodeJS

Local process version

nodeJS v20.11.1

Additional Info

No response

@DecathectZero DecathectZero added the bug Something isn't working label Nov 6, 2024
Copy link

linear bot commented Nov 6, 2024

@aviramha
Copy link
Member

aviramha commented Nov 6, 2024

Hi,
Looking at the error this seems to be an issue with kubectl port-forward (on the API level)

  2024-11-06T18:42:14.387598Z  WARN mirrord_kube::api::kubernetes::portforwarder: finished retry strategy, closing connection, connect_info: AgentKubernetesConnectInfo { pod_name: "mirrord-agent-wk5eyjv6hc-vllzt", agent_port: 44128, namespace: Some("dev-tyler"), agent_version: Some("3.122.1") }
    at mirrord/kube/src/api/kubernetes/portforwarder.rs:172

We know that kubectl port-forward tends to be flaky in some cases, but it's kinda out of our control (we even do retries as seen in the logs to handle it).

You can check your kube-apiserver if it restarted/had any error but other then that I think it's not something we can solve from our side.

P.S - The operator doesn't use kubectl port-forward (it implements its own API) and we didn't see a report of such issue for users of it - so perhaps that can be a solution.

@aviramha aviramha self-assigned this Nov 6, 2024
@DecathectZero
Copy link
Author

It seems like the agent just dies when this happens. Retried again but with mirrord 3.124.0

Same thing in mirrord-intproxy:

  2024-11-06T19:27:15.175003Z  WARN mirrord_kube::api::kubernetes::portforwarder: finished retry strategy, closing connection, connect_info: AgentKubernetesConnectInfo { pod_name: "mirrord-agent-nxavbqhb4z-6strh", agent_port: 32170, namespace: Some("dev-tyler"), agent_version: Some("3.122.1") }
    at mirrord/kube/src/api/kubernetes/portforwarder.rs:172

  2024-11-06T19:27:15.190863Z ERROR mirrord_intproxy::agent_conn: failed to receive message from the agent, inner task down
    at mirrord/intproxy/src/agent_conn.rs:226

  2024-11-06T19:27:15.191048Z ERROR mirrord_intproxy: task AGENT_CONNECTION failed: agent unexpectedly closed connection
    at mirrord/intproxy/src/lib.rs:243

  2024-11-06T19:27:15.193633Z ERROR mirrord::internal_proxy: Internal proxy encountered an error, exiting, error: Main internal proxy logic failed: agent unexpectedly closed connection
    at mirrord/cli/src/internal_proxy.rs:146

Copy link
Member

aviramha commented Nov 6, 2024

Agent dying when connection ends (because port forward died) makes sense (desired logic). If agent crashed I would've expected it to send "bye" message. If you'd like to make sure it's not agent dying for some reason I'd tail the logs of the agent and see if you get any error besides "no connections alive, shutting down"

@DecathectZero
Copy link
Author

Yup after investigating more port forward is definitely the culprit. Is there any good way around this at all?
kubernetes/kubectl#1525 (comment)

We can look into the operator, since it's not using port-forward

@aviramha
Copy link
Member

aviramha commented Nov 7, 2024

Yup after investigating more port forward is definitely the culprit. Is there any good way around this at all? kubernetes/kubectl#1525 (comment)

We can look into the operator, since it's not using port-forward

Reading the issue you mentioned I don't really understand what's going on since the crate we use for kube uses websocket already, but maybe k8s does websocket -> spdy -> in the webserver ?
in any case it looks like a longer term thing (we need kube crate to support the new connection if needed to change) and it might be using it already tbh

@DecathectZero
Copy link
Author

This is still happening quite frequently on our end.

Before on Kubernetes 1.27 it was not noticeable, but after upgrading to Kubernetes 1.30 it's happening quite often and has been distruptive.

I've enabled logs for the Kubernetes apiserver via AWS EKS, but nothing significant has appeared there. I can send them over to you via a private channel if you want.
Is there anything in particular that you are looking for in the kube-apiserver logs?

@aviramha
Copy link
Member

aviramha commented Nov 29, 2024

We found out that user used calico CNI +eBPF - need to retest.

@aviramha aviramha removed their assignment Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants