Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeLocal DNS container hung on SIGTERM #453

Open
rtheis opened this issue Jun 4, 2021 · 43 comments
Open

NodeLocal DNS container hung on SIGTERM #453

rtheis opened this issue Jun 4, 2021 · 43 comments

Comments

@rtheis
Copy link

rtheis commented Jun 4, 2021

We are still hitting the same problem reported by #394. The test failure occurred on Kubernetes version 1.21 with NodeLocal DNS cache version 1.17.3.

To recap, NodeLocal DNS container occasionally hangs on termination causing Kubernetes to kill the container using SIGTERM after the grace period has expired. This leaves left over iptables rules on the node thus breaking DNS resolution. Our theory is that there is iptables lock contention between NodeLocal DNS, Calico and/or Kubernetes.

@prameshj
Copy link
Contributor

Doesn't nodelocaldns startup right away though? The DNS downtime should be O(seconds). Is that what you observe?

It is possible for nodelocaldns to run into lock contention, but that usually has a log message. Anything in the logs?

@rtheis
Copy link
Author

rtheis commented Jun 22, 2021

@prameshj I was unable to collect any valuable logs at the time of the latest failure. I assume there is some type of lock contention that causes the pod to hang on termination.

NodeLocal DNS does startup right away, but our test failure comes when we verify that DNS works after disabling NodeLocal DNS. If we restart NodeLocal DNS then stop it again, that usually fixes the node.

@prameshj
Copy link
Contributor

NodeLocal DNS does startup right away, but our test failure comes when we verify that DNS works after disabling NodeLocal DNS.

Ah I see. Just to confirm - 1) test disables nodelocaldns 2) nodelocaldns pod is stuck handling sigterm and is killed by kubelet with iptables rules getting left over. 3) test times out with DNS failure?

Do you see a log line of the sigterm being handled -

log.Println("[INFO] SIGTERM: Shutting down servers then terminating")
?

It should call teardown in that case -

caddy.OnProcessExit = append(caddy.OnProcessExit, func() { cache.TeardownNetworking() })

It is possible that the pod handles sigterm and tries cleaning up iptables, but cannot get the lock. We do expose a metric on port 9353 for nodelocaldns lock errors, but we do not increment it for delete errors. We should check errors in

c.iptables.DeleteRule(rule.table, rule.chain, rule.args...)
and update the metric.

@rtheis
Copy link
Author

rtheis commented Jul 24, 2021

@prameshj That is correct. We've updated the termination grace period to 900 seconds and still see this problem. Although, the failure rate has been much lower than it has been in the past. Given the long termination, it seems like there is a hang somewhere. We'll continue trying to collect more data when this problem occurs.

@rtheis
Copy link
Author

rtheis commented Sep 14, 2021

We hit the problem again on Kubernetes version 1.20 with NodeLocal DNS version 1.17.3. Unfortunately, I don't have any additional debug data to provide.

@rtheis
Copy link
Author

rtheis commented Nov 4, 2021

We hit this problem again on Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. We are collecting debug data now to determine if we can find the root cause.

@prameshj
Copy link
Contributor

prameshj commented Nov 5, 2021

Thanks. I have also opened #488 to count errors from rule deletions at teardown, in case that provides some hints.

@rtheis
Copy link
Author

rtheis commented Nov 17, 2021

We hit the problem again Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. Here is the end of the log captured during pod termination:

[INFO] SIGTERM: Shutting down servers then terminating

@prameshj
Copy link
Contributor

prameshj commented Dec 6, 2021

We hit the problem again Kubernetes version 1.22 with NodeLocal DNS version 1.21.1. Here is the end of the log captured during pod termination:

[INFO] SIGTERM: Shutting down servers then terminating

Any metrics from node-cache?

@rtheis
Copy link
Author

rtheis commented Dec 6, 2021

@prameshj unfortunately, I don't have any metrics captured when the failure occurred. What would you like us to collect?

@prameshj
Copy link
Contributor

prameshj commented Dec 6, 2021

#488

The "setup_errors_total" metric which was modified in #488 to also increment during deletions. That PR is included in 1.21.2 and later tags.

@rtheis
Copy link
Author

rtheis commented Dec 6, 2021

Thanks, we'll update our test to collect metrics once we pull in the NodeLocal DNS cache latest version.

@rtheis
Copy link
Author

rtheis commented Jan 21, 2022

We were able to recreate the problem on NodeLocal DNS version 1.21.3. Here are the logs and metrics.

[INFO] SIGTERM: Shutting down servers then terminating
# HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built.
# TYPE coredns_build_info gauge
coredns_build_info{goversion="go1.16.10",revision="",version="1.7.0"} 1
# HELP coredns_cache_entries The number of elements in the cache.
# TYPE coredns_cache_entries gauge
coredns_cache_entries{server="dns://169.254.20.10:53",type="denial"} 2
coredns_cache_entries{server="dns://169.254.20.10:53",type="success"} 0
coredns_cache_entries{server="dns://172.21.0.10:53",type="denial"} 8
coredns_cache_entries{server="dns://172.21.0.10:53",type="success"} 1
# HELP coredns_cache_hits_total The count of cache hits.
# TYPE coredns_cache_hits_total counter
coredns_cache_hits_total{server="dns://172.21.0.10:53",type="denial"} 882
coredns_cache_hits_total{server="dns://172.21.0.10:53",type="success"} 131
# HELP coredns_cache_misses_total The count of cache misses.
# TYPE coredns_cache_misses_total counter
coredns_cache_misses_total{server="dns://169.254.20.10:53"} 8
coredns_cache_misses_total{server="dns://172.21.0.10:53"} 51
# HELP coredns_dns_request_duration_seconds Histogram of the time (in seconds) each request took.
# TYPE coredns_dns_request_duration_seconds histogram
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.002"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.004"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.008"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.016"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.032"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.064"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.128"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.256"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone=".",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="."} 0.12080307600000001
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.002"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.004"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.008"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.016"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.032"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.064"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.128"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.256"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="cluster.local.",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="cluster.local."} 0.002413771
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="cluster.local."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.002"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.004"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.008"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.016"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.032"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.064"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.128"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.256"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa.",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa."} 0.391487433
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.00025"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.0005"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.001"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.002"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.004"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.008"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.016"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.032"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.064"} 0
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.128"} 1
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.256"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="0.512"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="1.024"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="2.048"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="4.096"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="8.192"} 2
coredns_dns_request_duration_seconds_bucket{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa.",le="+Inf"} 2
coredns_dns_request_duration_seconds_sum{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa."} 0.256938208
coredns_dns_request_duration_seconds_count{server="dns://169.254.20.10:53",type="other",zone="ip6.arpa."} 2
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.00025"} 508
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.0005"} 518
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.001"} 530
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.002"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.004"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.008"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.016"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.032"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.064"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.128"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.256"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="0.512"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="1.024"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="2.048"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="4.096"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="8.192"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="A",zone="cluster.local.",le="+Inf"} 532
coredns_dns_request_duration_seconds_sum{server="dns://172.21.0.10:53",type="A",zone="cluster.local."} 0.047663295000000064
coredns_dns_request_duration_seconds_count{server="dns://172.21.0.10:53",type="A",zone="cluster.local."} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.00025"} 500
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.0005"} 519
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.001"} 531
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.002"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.004"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.008"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.016"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.032"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.064"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.128"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.256"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="0.512"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="1.024"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="2.048"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="4.096"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="8.192"} 532
coredns_dns_request_duration_seconds_bucket{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local.",le="+Inf"} 532
coredns_dns_request_duration_seconds_sum{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local."} 0.046383935999999994
coredns_dns_request_duration_seconds_count{server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local."} 532
# HELP coredns_dns_request_size_bytes Size of the EDNS0 UDP buffer in bytes (64K for TCP).
# TYPE coredns_dns_request_size_bytes histogram
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="."} 114
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 142
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 140
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="100"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="200"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="300"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="400"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="511"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="1023"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="2047"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="4095"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="8291"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="16000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="32000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="48000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="64000"} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="+Inf"} 2
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 132
coredns_dns_request_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 2
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="100"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="200"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="300"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="400"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="511"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="1023"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="2047"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="4095"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="8291"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="16000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="32000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="48000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="64000"} 1064
coredns_dns_request_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="+Inf"} 1064
coredns_dns_request_size_bytes_sum{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 73948
coredns_dns_request_size_bytes_count{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 1064
# HELP coredns_dns_requests_total Counter of DNS requests made per zone, protocol and family.
# TYPE coredns_dns_requests_total counter
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="cluster.local."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="in-addr.arpa."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://169.254.20.10:53",type="other",zone="ip6.arpa."} 2
coredns_dns_requests_total{family="1",proto="udp",server="dns://172.21.0.10:53",type="A",zone="cluster.local."} 532
coredns_dns_requests_total{family="1",proto="udp",server="dns://172.21.0.10:53",type="AAAA",zone="cluster.local."} 532
# HELP coredns_dns_response_size_bytes Size of the returned response in bytes.
# TYPE coredns_dns_response_size_bytes histogram
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone=".",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="."} 264
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local.",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 328
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="cluster.local."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa.",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 308
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="200"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="300"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="400"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="511"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="1023"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="2047"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="4095"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="8291"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="16000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="32000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="48000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="64000"} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa.",le="+Inf"} 2
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 284
coredns_dns_response_size_bytes_count{proto="udp",server="dns://169.254.20.10:53",zone="ip6.arpa."} 2
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="0"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="100"} 0
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="200"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="300"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="400"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="511"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="1023"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="2047"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="4095"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="8291"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="16000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="32000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="48000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="64000"} 1064
coredns_dns_response_size_bytes_bucket{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local.",le="+Inf"} 1064
coredns_dns_response_size_bytes_sum{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 167447
coredns_dns_response_size_bytes_count{proto="udp",server="dns://172.21.0.10:53",zone="cluster.local."} 1064
# HELP coredns_dns_responses_total Counter of response status codes.
# TYPE coredns_dns_responses_total counter
coredns_dns_responses_total{rcode="NOERROR",server="dns://172.21.0.10:53",zone="cluster.local."} 266
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="cluster.local."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://169.254.20.10:53",zone="ip6.arpa."} 2
coredns_dns_responses_total{rcode="NXDOMAIN",server="dns://172.21.0.10:53",zone="cluster.local."} 798
# HELP coredns_forward_max_concurrent_rejects_total Counter of the number of queries rejected because the concurrent queries were at maximum.
# TYPE coredns_forward_max_concurrent_rejects_total counter
coredns_forward_max_concurrent_rejects_total 0
# HELP coredns_forward_request_duration_seconds Histogram of the time each request took.
# TYPE coredns_forward_request_duration_seconds histogram
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.00025"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.0005"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.001"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.002"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.004"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.008"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.016"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.032"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.064"} 0
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.128"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.256"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="0.512"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="1.024"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="2.048"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="4.096"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="8.192"} 1
coredns_forward_request_duration_seconds_bucket{to="10.0.80.11:53",le="+Inf"} 1
coredns_forward_request_duration_seconds_sum{to="10.0.80.11:53"} 0.112503827
coredns_forward_request_duration_seconds_count{to="10.0.80.11:53"} 1
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.00025"} 1
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.0005"} 34
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.001"} 49
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.002"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.004"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.008"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.016"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.032"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.064"} 53
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.128"} 55
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.256"} 56
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="0.512"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="1.024"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="2.048"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="4.096"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="8.192"} 57
coredns_forward_request_duration_seconds_bucket{to="172.21.156.115:53",le="+Inf"} 57
coredns_forward_request_duration_seconds_sum{to="172.21.156.115:53"} 0.6737926949999999
coredns_forward_request_duration_seconds_count{to="172.21.156.115:53"} 57
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.00025"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.0005"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.001"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.002"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.004"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.008"} 0
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.016"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.032"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.064"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.128"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.256"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="0.512"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="1.024"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="2.048"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="4.096"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="8.192"} 1
coredns_forward_request_duration_seconds_bucket{to="8.8.8.8:53",le="+Inf"} 1
coredns_forward_request_duration_seconds_sum{to="8.8.8.8:53"} 0.008086901
coredns_forward_request_duration_seconds_count{to="8.8.8.8:53"} 1
# HELP coredns_forward_requests_total Counter of requests made per upstream.
# TYPE coredns_forward_requests_total counter
coredns_forward_requests_total{to="10.0.80.11:53"} 1
coredns_forward_requests_total{to="172.21.156.115:53"} 57
coredns_forward_requests_total{to="8.8.8.8:53"} 1
# HELP coredns_forward_responses_total Counter of requests made per upstream.
# TYPE coredns_forward_responses_total counter
coredns_forward_responses_total{rcode="NOERROR",to="172.21.156.115:53"} 9
coredns_forward_responses_total{rcode="NXDOMAIN",to="10.0.80.11:53"} 1
coredns_forward_responses_total{rcode="NXDOMAIN",to="172.21.156.115:53"} 48
coredns_forward_responses_total{rcode="NXDOMAIN",to="8.8.8.8:53"} 1
# HELP coredns_health_request_duration_seconds Histogram of the time (in seconds) each request took.
# TYPE coredns_health_request_duration_seconds histogram
coredns_health_request_duration_seconds_bucket{le="0.00025"} 0
coredns_health_request_duration_seconds_bucket{le="0.0005"} 20
coredns_health_request_duration_seconds_bucket{le="0.001"} 1074
coredns_health_request_duration_seconds_bucket{le="0.002"} 1107
coredns_health_request_duration_seconds_bucket{le="0.004"} 1117
coredns_health_request_duration_seconds_bucket{le="0.008"} 1119
coredns_health_request_duration_seconds_bucket{le="0.016"} 1119
coredns_health_request_duration_seconds_bucket{le="0.032"} 1119
coredns_health_request_duration_seconds_bucket{le="0.064"} 1119
coredns_health_request_duration_seconds_bucket{le="0.128"} 1119
coredns_health_request_duration_seconds_bucket{le="0.256"} 1119
coredns_health_request_duration_seconds_bucket{le="0.512"} 1119
coredns_health_request_duration_seconds_bucket{le="1.024"} 1119
coredns_health_request_duration_seconds_bucket{le="2.048"} 1119
coredns_health_request_duration_seconds_bucket{le="4.096"} 1119
coredns_health_request_duration_seconds_bucket{le="8.192"} 1119
coredns_health_request_duration_seconds_bucket{le="+Inf"} 1119
coredns_health_request_duration_seconds_sum 0.793689844
coredns_health_request_duration_seconds_count 1119
# HELP coredns_panics_total A metrics that counts the number of panics.
# TYPE coredns_panics_total counter
coredns_panics_total 0
# HELP coredns_plugin_enabled A metric that indicates whether a plugin is enabled on per server and zone basis.
# TYPE coredns_plugin_enabled gauge
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="cache",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="cache",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="errors",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="forward",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="log",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="loop",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://169.254.20.10:53",zone="ip6.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="cluster.local."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="in-addr.arpa."} 1
coredns_plugin_enabled{name="prometheus",server="dns://172.21.0.10:53",zone="ip6.arpa."} 1
# HELP coredns_reload_failed_total Counter of the number of failed reload attempts.
# TYPE coredns_reload_failed_total counter
coredns_reload_failed_total 0
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.5841e-05
go_gc_duration_seconds{quantile="0.25"} 8.2532e-05
go_gc_duration_seconds{quantile="0.5"} 0.000124131
go_gc_duration_seconds{quantile="0.75"} 0.000154354
go_gc_duration_seconds{quantile="1"} 0.000278483
go_gc_duration_seconds_sum 0.003190547
go_gc_duration_seconds_count 25
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 45
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.10"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 8.073456e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.320392e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.457767e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 302576
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.504496677254594e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.355248e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 8.073456e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 5.6008704e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.0051584e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 36502
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 5.246976e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6060288e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6427311328710146e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 339078
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 123216
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 147456
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.2238992e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 708273
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 1.048576e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 1.048576e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.4793992e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 11
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.5
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 20
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.8047744e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.64273003965e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.54192384e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19

@prameshj
Copy link
Contributor

Thanks for sharing this. However this does not include the "setup_errors_total" metric. This metric is exposed on 9353 port. The other coredns metrics from prometheus plugin are exposed on 9253.

By any chance, would you be able to export these metrics to a dashboard, so we can see the values as a function of time?

However, the logs don't have an entry like "Failed deleting iptables rule" - so it does not look like an iptables lock error :(

@rtheis
Copy link
Author

rtheis commented Jan 24, 2022

@prameshj I'll fix our error collection to get metrics on port 9353.

@rtheis
Copy link
Author

rtheis commented Feb 3, 2022

Here's recreate data for NodeLocal DNS version 1.21.3 on Kubernetes version 1.22:

Logs:

[INFO] SIGTERM: Shutting down servers then terminating

Metrics:

# HELP coredns_nodecache_setup_errors_total The number of errors during periodic network setup for node-cache
# TYPE coredns_nodecache_setup_errors_total counter
coredns_nodecache_setup_errors_total{errortype="configmap"} 0
coredns_nodecache_setup_errors_total{errortype="interface_add"} 0
coredns_nodecache_setup_errors_total{errortype="interface_check"} 0
coredns_nodecache_setup_errors_total{errortype="iptables"} 0
coredns_nodecache_setup_errors_total{errortype="iptables_lock"} 0

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2022
@rtheis
Copy link
Author

rtheis commented May 4, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2022
@rtheis
Copy link
Author

rtheis commented Aug 2, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2022
@rtheis
Copy link
Author

rtheis commented Nov 1, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 1, 2022
@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2023
@rtheis
Copy link
Author

rtheis commented Jan 30, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2023
@dpasiukevich
Copy link
Member

Apologies for not taking a look, I will try to see what's going on within the week.

@rtheis
Copy link
Author

rtheis commented Jan 30, 2023

Apologies for not taking a look, I will try to see what's going on within the week.

Thank you. The problem continues but is hard to recreate. If there is any debug data that you'd like me to collect when we have a recreate, please let me know.

@willzhang
Copy link

what happend

kubernetes v1.25.6 with nodelocaldns 1.21.1 same problems

root@node1:~# kubectl get pods -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-854448c89-brj8f           1/1     Running   0             6m46s
calico-apiserver   calico-apiserver-854448c89-j4vpv           1/1     Running   0             6m46s
calico-system      calico-kube-controllers-7bb667cfc6-vzbp4   1/1     Running   0             7m5s
calico-system      calico-node-q47hb                          1/1     Running   0             7m5s
calico-system      calico-typha-7d6d59f8f-55p5c               1/1     Running   0             7m5s
kube-system        coredns-5d5d4f8c5b-tzxbr                   1/1     Running   0             6m6s
kube-system        dns-autoscaler-7cfb7f9f95-7qk5f            1/1     Running   0             6m2s
kube-system        etcd-node1                                 1/1     Running   0             7m55s
kube-system        kube-apiserver-node1                       1/1     Running   0             8m
kube-system        kube-controller-manager-node1              1/1     Running   1             7m54s
kube-system        kube-scheduler-node1                       1/1     Running   1             7m54s
kube-system        nodelocaldns-g8r8r                         0/1     Error     2 (32s ago)   35s
tigera-operator    tigera-operator-6bb5669f85-665wb           1/1     Running   0             7m9s
root@node1:~# kubectl -n kube-system logs -f nodelocaldns-g8r8r 
2023/02/11 13:42:00 [INFO] Starting node-cache image: 1.21.1
2023/02/11 13:42:00 [INFO] Using Corefile /etc/coredns/Corefile
2023/02/11 13:42:00 [INFO] Using Pidfile 
2023/02/11 13:42:00 [ERROR] Failed to read node-cache coreFile /etc/coredns/Corefile.base - open /etc/coredns/Corefile.base: no such file or directory
2023/02/11 13:42:00 [INFO] Skipping kube-dns configmap sync as no directory was specified
.:53 on 169.254.25.10
cluster.local.:53 on 169.254.25.10
in-addr.arpa.:53 on 169.254.25.10
ip6.arpa.:53 on 169.254.25.10
[INFO] plugin/reload: Running configuration MD5 = adf97d6b4504ff12113ebb35f0c6413e
CoreDNS-1.7.0
linux/amd64, go1.16.8, 
[ERROR] plugin/errors: 2 4862263584182278023.3826447364325757841. HINFO: read udp 169.254.25.10:47232->169.254.25.10:53: i/o timeout
[ERROR] plugin/errors: 2 2408815618409886486.7089417576541627048.in-addr.arpa. HINFO: read tcp 192.168.72.31:46502->10.233.102.132:53: i/o timeout
[ERROR] plugin/errors: 2 2785916534782284493.260523573863106281.ip6.arpa. HINFO: read tcp 192.168.72.31:46500->10.233.102.132:53: i/o timeout
[FATAL] plugin/loop: Loop (169.254.25.10:47502 -> 169.254.25.10:53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 4862263584182278023.3826447364325757841."
root@node1:~# 

what i do

use kubespray v2.21.0 install kubernetes v1.25.6.

1、when i run kubespray install i config the wrong kubelet flag at first time, so kubeadm init failed and i interrupt the init

2、 I solved the kubelet falg problem. then because ansible is idempotent, I continued to implement cluster deployment. Finally, I succeeded, but the nodelocaldns pod failed to start

@meetmatt
Copy link

meetmatt commented Mar 2, 2023

Hello, I'm experiencing somewhat related issue with dns-node-cache, leading to endless crashloop backoff.

[INFO] Using Corefile /etc/coredns/Corefile
[ERROR] Failed to read node-cache coreFile /etc/coredns/Corefile.base - open /etc/coredns/Corefile.base: no such file or directory
[ERROR] Failed to sync kube-dns config directory /etc/kube-dns, err: lstat /etc/kube-dns: no such file or directory
[ERROR] Failed to add non-existent interface nodelocaldns: operation not supported
[INFO] Added interface - nodelocaldns
[ERROR] Error checking dummy device nodelocaldns - operation not supported
listen tcp 169.254.25.10:9254: bind: cannot assign requested address

Installing k8s via KubeKey

@isugimpy
Copy link

isugimpy commented May 3, 2023

Just popping in here to say that I'm experiencing the same behavior as OP on Kubernetes v1.24.9 with node-local-dns v1.17.4. What I discovered earlier is that the SIGTERM at pod termination hangs and a SIGKILL happens. When the new pod starts up, all DNS traffic to it seems to fail. I haven't been able to validate this on a live node, this is based on forensics via logs and metrics, so I don't know if connections are simply timing out, or being refused, or if they manage to connect to the node-local-dns service and it's unable to make outbound calls to resolve things. I've been seeing this happen periodically but didn't pin down this being the issue until today. If I get a repro I can try to provide more data.

Logs just before the old pod dies:

May 2 01:42:13 node-local-dns-7drqn node-cache INFO [INFO] SIGTERM: Shutting down servers then terminating
May 2 01:42:13 node-local-dns-7drqn node-cache INFO [INFO] Tearing down
May 2 01:42:13 node-local-dns-7drqn node-cache WARNING [WARNING] Exiting iptables/interface check goroutine
May 2 01:42:21 node-local-dns-7drqn node-cache ERROR [ERROR] Untrapped signal, tearing down
May 2 01:42:21 node-local-dns-7drqn node-cache INFO [INFO] Tearing down

At startup of the new pod, I do see the log entry for adding the nodelocaldns interface, and the iptables rules, but nothing further happens from that point. Traffic to the pod's metrics port does work, and I was able to get metrics from it just fine, it just reported it never received another DNS request.

@isugimpy
Copy link

isugimpy commented May 3, 2023

An update here:

I managed to repro this today. There's definitely something unusual going on. What's happening is that at startup of the replacement pod, the iptables rules never get added. I see in the logs where it claims to add them via the Added back nodelocaldns rule entries. All the messages that I'd expect to be present there are in the logs. But if I do an iptables-save, those rules aren't present in the output. Deleting the pod and letting it be recreated never fixes this either. Something about this is permanently poisoning the machine such that the node-cache binary thinks it's adding rules and they never make it in. I have gone to a healthy, working node and grabbed the relevant rules from it to make sure it's not something on the iptables side failing, and when I insert them into the chain they do insert successfully, so there has to be some kind of bug in the insertion process in the node-cache binary.

@dpasiukevich
Copy link
Member

Nice find!

The nodelocaldns uses "k8s.io/kubernetes/pkg/util/iptables" to manage iptables rules.

node-local-dns v1.17.4 is somewhat old and it uses k8s.io/kubernetes v0.0.0-00010101000000-000000000000
The latest node-local-dns images (e.g. >= 1.22.19) use: k8s.io/kubernetes v1.24.10

@isugimpy could you try with the nodelocaldns 1.22.20 to see if the newer iptables client would work correctly?

As for the nodelocaldns iptables usage it's trivial and seems correct to me: source

  1. try to insert rule
  2. log if it exists already
  3. log info if no error
  4. log error if error.

@agilanbtdw
Copy link

Fixed this issue with these steps,

  1. Edit resolvConf value from /run/systemd/resolve/resolv.conf to /etc/resolv.conf in the kubelet-config.yaml file on every node throwing this nodelocaldns pod error. Before this change coredns took local/loopback addresses as upstream servers from /run/systemd/resolve/resolv.conf file. I just changed it to my /etc/resolv.conf file containing 8.8.8.8 and 8.8.4.4 as nameservers. More on this here.
  2. sudo systemctl restart kubelet
  3. kubectl delete pod <nodelocaldns pod name that is not running> -n kube-system
  4. Upon pod recreation, it was running. Some stubborn nodes ran after a reboot.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@rtheis
Copy link
Author

rtheis commented Jan 21, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2024
@rtheis
Copy link
Author

rtheis commented Apr 20, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2024
@rtheis
Copy link
Author

rtheis commented Jul 19, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2024
@yahalomimaor
Copy link

Hi
I'm facing the same issue as well
Getting "ip table lock issue" then the pod is not responding,
I'm using the latest version 1.22.28
Can you please advise who it happens, and how can it be mitigated?

@rtheis
Copy link
Author

rtheis commented Aug 15, 2024

@yahalomimaor in the past, I was able to fix the problem by recreating the NodeLocal DNS pod on the node encountering the problem.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2024
@rtheis
Copy link
Author

rtheis commented Nov 13, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2024
@k8s-ci-robot
Copy link
Contributor

@yahalomimaor: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants