Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix pod hostname to be podname for dns srv records #232

Closed
wants to merge 1 commit into from

Conversation

krmayankk
Copy link

@krmayankk krmayankk commented May 18, 2018

Fixes #116 and kubernetes/kubernetes#47992

Verified this works now

/ # nslookup my-service.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      my-service.new-hire
Address 1: 10.251.156.52 nginx-deployment-6fc8cd7954-q8p99.my-service.new-hire.svc.cluster.local
Address 2: 10.251.156.7 nginx-deployment-6fc8cd7954-w44gr.my-service.new-hire.svc.cluster.local

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 18, 2018
@krmayankk
Copy link
Author

/assign @thockin

@krmayankk
Copy link
Author

/assign @bowei

@thockin
Copy link
Member

thockin commented May 18, 2018

I don't think this is right - TargetRef is almost always Pod, this would break all the functionality around pod.spec.hostname ?

@krmayankk
Copy link
Author

@thockin any other suggestions ? If hostname is not empty we use hostname else use podname , would that help ? I am open to suggestions as i am not aware of how the hostname is used but its completely broken when create a deployment with three instances, all of them get the same hostname and i cant resolve their ips using their dns names.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 18, 2018
@krmayankk
Copy link
Author

@thockin addressed comments PTAL

@krmayankk
Copy link
Author

@thockin @bowei PTAL

@thockin
Copy link
Member

thockin commented May 24, 2018

This does not seem to address the problem:

When a headless service is created to point to pods which share a single hostname ...

You don't handle that case any differently than before?

@krmayankk
Copy link
Author

krmayankk commented May 24, 2018

This does not seem to address the problem:

When a headless service is created to point to pods which share a single hostname ...

You don't handle that case any differently than before?

@thockin that is correct. I improved the situation from before in the case, where if hostname is not specified, we will use the pod name, but if hostname is still there in the pod, we will use that.

Earlier all pods of the same deployment would get the same dns entries, and they would resolve to just one of the pods IP, if they had a hostname specified and if no hostname was specified, they wont get any dns entry.

@thockin
Copy link
Member

thockin commented May 29, 2018

Sorry, I am very confused as to which problem you are solving now.

A headless service with 3 backends will get 3 responses to the service's DNS name. This is correct behavior.

$ cat /tmp/svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: hostnames-headless
spec:
  ports:
  - port: 80
  selector:
    run: hostnames
  clusterIP: None
root@18066-7bd974f77d-5jrr4:/# dig +short +search hostnames-headless
10.64.0.35
10.64.2.31
10.64.2.32

If you add a hostname to the pod, you still get 3

root@7852-54d9d946d-pkxm8:/# dig +short +search hostnames-headless
10.64.0.36
10.64.0.37
10.64.2.36

Can you clarify what you're trying to solve again?

@krmayankk
Copy link
Author

krmayankk commented May 31, 2018

@thockin curious was you service pointing to a Deployment or a StatefulSet. From the hostname it seems you were indeed pointing to Deployment, but just want to confirm. I am using kube-dns 1.14.10 and here is what i see with headless service called my-spcy and namespace new-hire and hostname stupid-9.

  • using hostname in deployment, my query for headless service only returns one entry using both nslookup and dig and lookup for podname fails and reverse lookup for ip returns me the ip itself with no meaningful name. This makes sense to me since the code to get hostname for both pods, always returns the provided hostname stupid-9 for both pods, so both pods are being entered into the record with same name, so the last name wins.
nslookup my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      my-spcy.new-hire
Address 1: 10.251.156.66 stupid-9.my-spcy.new-hire.svc.cluster.local
nslookup nginx-deployment-6fc8cd7954-7jvbt.my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'nginx-deployment-6fc8cd7954-7jvbt.my-spcy.new-hire'
/ # nslookup 10.251.156.183
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      10.251.156.183
Address 1: 10.251.156.183
  • without using hostname , my dns lookup for headless service succeeds and returns both pods ips but again the pod name lookup fails and reverse lookup for ip only returns the ip itself
/ # nslookup my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      my-spcy.new-hire
Address 1: 10.251.156.116
Address 2: 10.251.156.183
 nslookup nginx-deployment-56c8f4cd8c-grmlm.my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'nginx-deployment-56c8f4cd8c-grmlm.my-spcy.new-hire'
  • with the fix i am proposing, the nslookup for the headless service properly returns the ip and pod name and both podname lookup and reverse lookup for ip work
/ # nslookup my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      my-spcy.new-hire
Address 1: 10.251.158.118 nginx-deployment-56d55c7b6-ldnxc.my-spcy.new-hire.svc.cluster.local
Address 2: 10.251.158.83 nginx-deployment-56d55c7b6-wv99r.my-spcy.new-hire.svc.cluster.local
/ # nslookup  nginx-deployment-56d55c7b6-ldnxc.my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      nginx-deployment-56d55c7b6-ldnxc.my-spcy.new-hire
Address 1: 10.251.158.118 nginx-deployment-56d55c7b6-ldnxc.my-spcy.new-hire.svc.cluster.local
/ # nslookup  nginx-deployment-56d55c7b6-ldnxc.my-spcy.new-hire^C
/ # nslookup nginx-deployment-56d55c7b6-wv99r.my-spcy.new-hire
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      nginx-deployment-56d55c7b6-wv99r.my-spcy.new-hire
Address 1: 10.251.158.83 nginx-deployment-56d55c7b6-wv99r.my-spcy.new-hire.svc.cluster.local
/ # nslookup 10.251.158.83
Server:    10.254.208.255
Address 1: 10.254.208.255 kube-dns.kube-system.svc.cluster.local

Name:      10.251.158.83
Address 1: 10.251.158.83 nginx-deployment-56d55c7b6-wv99r.my-spcy.new-hire.svc.cluster.local

Let me know if that clarifies. I should have put all this in the description though.

@thockin
Copy link
Member

thockin commented Jun 1, 2018

Ahh, #116 is all about not reporting multiple IPs when the same hostname is used, not about using the pod's name. so this doesn't fix #116 at all. It changes behavior to use the pod name rather than an anonymous name.

What is the use case for that?

If the intent is to fix #116, we need to handle duplicates explicitly

@krmayankk
Copy link
Author

krmayankk commented Jun 5, 2018

@thockin in #116 one of the example , the author of the issue mentions is the following, which says that the when resolving the headless service, only one A record is returned. This PR fixes that problem.

# host depl-1-service.default.svc.cluster.local
depl-1-service.default.svc.cluster.local has address 10.56.0.140

Also see kubernetes/kubernetes#47992 and the similar changes in coredns here https://github.com/coredns/coredns/pull/1190/files

If the intent is to fix #116, we need to handle duplicates explicitly

Can you elaborate ? the pod names are always different. Do you want to explicitly check if the pod names are not colliding ?

@thockin
Copy link
Member

thockin commented Jun 5, 2018

You're not handling the case of hostname being set at all. You're also not handling bogus TargetRef.Name collisions.

The bugs effectively say 2 things:

  1. That a headless service with no hostname field does not get a DNS name per pod. That's working as intended. If we want to use pod name instead of nothing, we can consider that but it's actually a change in the DNS specification for Kubernetes and would need to propgate into CoreDNS, too. @johnbelamaric

  2. That a headless service with a hostname field gets a single DNS record. That's a bug. It should probably build a list for the leaf name.

As far as I can see you are only addressing point 1, right? That seems like the less important issue of the two.

@krmayankk
Copy link
Author

krmayankk commented Jun 7, 2018

@thockin yes i am only addressing 1 so far. I am happy to update the dns specification. @johnbelamaric how is this supported in CoreDns. This fixes dns resolution of pod name and reverse lookups.

Regarding not handling bogus TargetRef.Name collisions, i am having a hard time thinking about the case when TargetRef.Name will collide ? Even though a Service can span across deployment objects, within a namespace they will never collide. Can a Service selector span across namespaces ? Also yes a Service can span across Pods created manually which can lead to collision, but no one uses Pods without a workload controller

For handling case 2(hostname specified), i am not sure why the hostname was introduced at all ? It seems it would only help when the pod is created on its own without using ReplicaSet or Deployment which is hardly ever the case.

Should we instead , use the dashed form of ip address as the name of the A record when hostname is specified or even always . CoreDns in this PR does exactly that https://github.com/coredns/coredns/pull/1190/files. Any other suggestions ?

@krmayankk
Copy link
Author

@thockin i am happy to discuss on hangout if that will help us move faster .

@krmayankk
Copy link
Author

Ping @thockin

@johnbelamaric
Copy link
Member

@johnbelamaric how is this supported in CoreDns. This fixes dns resolution of pod name and reverse lookups.

I believe this works in CoreDNS when you specify the option endpoint_pod_names, which is not part of the standard spec. In that case, it will still use hostname and subdomain but if they are not set, it will use the pod name instead of the dashed version of the IP.

Headless service with endpoint_pod_names enabled and no hostname or subdomain set:

dnstools# host pause
pause.default.svc.cluster.local has address 172.17.0.14
pause.default.svc.cluster.local has address 172.17.0.16
pause.default.svc.cluster.local has address 172.17.0.4
pause.default.svc.cluster.local has address 172.17.0.9
dnstools# nslookup pause
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	pause.default.svc.cluster.local
Address: 172.17.0.14
Name:	pause.default.svc.cluster.local
Address: 172.17.0.16
Name:	pause.default.svc.cluster.local
Address: 172.17.0.4
Name:	pause.default.svc.cluster.local
Address: 172.17.0.9

dnstools# host -t srv pause
pause.default.svc.cluster.local has SRV record 0 25 443 pause-65bb4c479f-qv84p.pause.default.svc.cluster.local.
pause.default.svc.cluster.local has SRV record 0 25 443 pause-65bb4c479f-zc8lx.pause.default.svc.cluster.local.
pause.default.svc.cluster.local has SRV record 0 25 443 pause-65bb4c479f-q7lf2.pause.default.svc.cluster.local.
pause.default.svc.cluster.local has SRV record 0 25 443 pause-65bb4c479f-566rt.pause.default.svc.cluster.local.
dnstools# host -t ptr 172.17.0.14
14.0.17.172.in-addr.arpa domain name pointer pause-65bb4c479f-qv84p.pause.default.svc.cluster.local.

Headless service with endpoint_pod_names and with hostname: foo and subdomain: pause, you will still get multiple A records back for the name:

dnstools# host foo.pause
foo.pause.default.svc.cluster.local has address 172.17.0.14
foo.pause.default.svc.cluster.local has address 172.17.0.16
foo.pause.default.svc.cluster.local has address 172.17.0.18
foo.pause.default.svc.cluster.local has address 172.17.0.9
dnstools# host -t ptr 172.17.0.16
16.0.17.172.in-addr.arpa domain name pointer foo.pause.default.svc.cluster.local.
dnstools#

With hostname set, the pod name and IP-named records do not exist (this is intentional):

dnstools# host pause-68f9ddd445-5k9r7.pause.default.svc.cluster.local
Host pause-68f9ddd445-5k9r7.pause.default.svc.cluster.local not found: 3(NXDOMAIN)
dnstools# host 172-17-0-9.pause.default.svc.cluster.local
Host 172-17-0-9.pause.default.svc.cluster.local not found: 3(NXDOMAIN)

Finally, with endpoint_pod_names disabled, you get the IP-dashed version of the name:

dnstools# host -t srv pause
pause.default.svc.cluster.local has SRV record 0 25 443 172-17-0-14.pause.default.svc.cluster.local.
pause.default.svc.cluster.local has SRV record 0 25 443 172-17-0-18.pause.default.svc.cluster.local.
pause.default.svc.cluster.local has SRV record 0 25 443 172-17-0-4.pause.default.svc.cluster.local.
pause.default.svc.cluster.local has SRV record 0 25 443 172-17-0-9.pause.default.svc.cluster.local.
dnstools# host pause-65bb4c479f-rljmb.pause.default.svc.cluster.local
Host pause-65bb4c479f-rljmb.pause.default.svc.cluster.local not found: 3(NXDOMAIN)
dnstools# host 172-17-0-9.pause.default.svc.cluster.local.
172-17-0-9.pause.default.svc.cluster.local has address 172.17.0.9
dnstools# host -t ptr 172.17.0.9
9.0.17.172.in-addr.arpa domain name pointer 172-17-0-9.pause.default.svc.cluster.local.
dnstools#

For the endpoint names, the spec just says it is the hostname or:

A unique, system-assigned identifier for the endpoint. The exact format and source of this identifier is not prescribed by this specification. However, it must be possible to use this to identify a specific endpoint in the context of a Service. This is used in the event no explicit endpoint hostname is defined.

So, endpoint_pod_names is in fact compliant with the spec - it just goes beyond what the spec says.

So, rather than updating kube-dns, why not just deploy CoreDNS?

@chrisohaver
Copy link
Contributor

@thockin
Copy link
Member

thockin commented Jul 6, 2018

@krmayankk

@thockin yes i am only addressing 1 so far. I am happy to update the dns specification. @johnbelamaric how is this supported in CoreDns. This fixes dns resolution of pod name and reverse lookups.

Pod names are not supposed to be resolvable. Making them reverse-resolvable is not a good thing, IMO. What is the use-case? The pod name is non-deterministic outside of a StatefulSet or manual control, so forward lookups are unlikely to be helpful. I understand the desire for reverse lookups. What if we simply returned the Service name in this case?

E.g. given:

$ dig +search +short headless.default
10.64.1.15
10.64.2.15
10.64.2.18

if hostname and domainname were set:

$ dig +search +short -x 10.64.1.15
foo.headless.default.svc.cluster.local.

$ dig +search +short -x 10.64.2.15
bar.headless.default.svc.cluster.local.

$ dig +search +short -x 10.64.2.15
qux.headless.default.svc.cluster.local.

if subdomain was set, but hostname was not:

$ dig +search +short -x 10.64.1.15
headless.default.svc.cluster.local.

$ dig +search +short -x 10.64.2.15
headless.default.svc.cluster.local.

$ dig +search +short -x 10.64.2.15
headless.default.svc.cluster.local.

for completeness, if neither subdomain nor hostname were set:

$ dig +search +short -x 10.64.1.15

$ dig +search +short -x 10.64.2.15

$ dig +search +short -x 10.64.2.15

(and even this we could maybe fix by returning <pod-name>.<namespace>.pod.<suffix>, but to do that we'd need to watch all pods, so we would need a multi-level architecture. So more work. :)

Regarding not handling bogus TargetRef.Name collisions, i am having a hard time thinking about the case when TargetRef.Name will collide ?

Remember that Endpoints should not be "trusted" too far. Users can manually set Endpoints in some cases.

but no one uses Pods without a workload controller

Don't think about polite users. Think about trouble makers.

i am not sure why the hostname was introduced at all ? It seems it would only help when the pod is created on its own without using ReplicaSet or Deployment which is hardly ever the case.

StatefulSet

TL;DR: If a pod specifies subdomain and that values is actually a Service, we can solve the reverse lookup problem pretty easily (I think?). We don't need to trust pod names or anything else for that. If a pod specifies both hostname and subdomain, we're already doing the right thing (I think?)

Before we review code, we should get agreement on a change to the DNS spec.

@thockin thockin added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2018
@thockin
Copy link
Member

thockin commented Aug 23, 2018

I'm still very game to fix the bugs here, but throwing a hold on it for now.

@krmayankk krmayankk mentioned this pull request Sep 25, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 21, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 21, 2018
@thockin
Copy link
Member

thockin commented Jan 4, 2019

@krmayankk thoughts on what's next for this?

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants