Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to parse multiaddr: unknown protocol http-path #492

Open
bajtos opened this issue Dec 6, 2024 · 9 comments
Open

failed to parse multiaddr: unknown protocol http-path #492

bajtos opened this issue Dec 6, 2024 · 9 comments

Comments

@bajtos
Copy link
Contributor

bajtos commented Dec 6, 2024

❯ lassie fetch -o /dev/null -v QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
Fetching QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
could not get retrieval candidates for QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm: failed to parse multiaddr "/dns/http.f02620.devtty.eu/https/http-path/%2Fipni-provider%2F12D3KooWNXvbyvLUUd1qQEqhzjTpVoT5fdYUZEv4RJSxZ3rDF2c7": unknown protocol http-path

I discovered this while troubleshooting a failing test. The test used to work fine until recently.

@bajtos
Copy link
Contributor Author

bajtos commented Dec 6, 2024

Version info:

  • go 1.22.6
  • lassie 0.23.2
❯ go1.22.6 version             
go version go1.22.6 darwin/arm64
❯ go1.22.6 install ./cmd/lassie
❯ lassie version
lassie version v0.23.2-d8f473e
❯ lassie fetch -o /dev/null -v QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
Fetching QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
could not get retrieval candidates for QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm: failed to parse multiaddr "/dns/http.f02620.devtty.eu/https/http-path/%2Fipni-provider%2F12D3KooWNXvbyvLUUd1qQEqhzjTpVoT5fdYUZEv4RJSxZ3rDF2c7": unknown protocol http-path

@rvagg
Copy link
Member

rvagg commented Dec 9, 2024

eh, I think this means we need to get go-libipni updated. The original unofficial component was httpath but then it got registered and negotiated to http-path and made official.

This PR: ipni/go-libipni#206

Comes after the version we are currently using here: https://github.com/ipni/go-libipni/releases/tag/v0.6.6

@rvagg
Copy link
Member

rvagg commented Dec 17, 2024

@bajtos do you have any other CIDs I can try for this? Nothing I try to retrieve seems to find candidates but I'm not sure if it's because I've messed something up on my end or if I just don't have a good range of indexed CIDs to try out.

@rvagg
Copy link
Member

rvagg commented Dec 17, 2024

#495 would be worth a test if you have something to try out with

@bajtos
Copy link
Contributor Author

bajtos commented Dec 18, 2024

@bajtos do you have any other CIDs I can try for this? Nothing I try to retrieve seems to find candidates but I'm not sure if it's because I've messed something up on my end or if I just don't have a good range of indexed CIDs to try out.

I don't have any other CIDs, unfortunately. I discovered the problem by coincidence because our tests were using the CID QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm to verify integration with lassie.

Maybe you can ask the IPNI folks if they can find you CIDs stored with providers advertising http-path address.

Here is an idea you can try:

  • https://cid.contact/providers provides list of all index providers, including their address and the CID of the latest advertisement
  • you can filter the providers to find those with http-path in their multiaddr
  • then you can inspect their latest advertisement to find entries (payload CIDs) they advertised to IPNI

Example provider info using http-path:

  {
    "AddrInfo": {
      "ID": "12D3KooWA9V3M5aEeZMfaGyxs6nCqW3QeZUidgkWkfhLJftdr3Ae",
      "Addrs": [
        "/dns/s1.node.storacha.network/https/http-path/blob%2F%7Bblob%7D",
        "/dns/s1.node.storacha.network/https/http-path/claim%2F%7Bclaim%7D"
      ]
    },
    "LastAdvertisement": {
      "/": "baguqeeradgcmpwo5ufjpd7qlakr4cqxqyvqqd276llglhx5lmvgk4lv7beuq"
    },
    "LastAdvertisementTime": "2024-12-12T00:02:15Z",
    "Publisher": {
      "ID": "12D3KooWA9V3M5aEeZMfaGyxs6nCqW3QeZUidgkWkfhLJftdr3Ae",
      "Addrs": [
        "/dns/s1.node.storacha.network/https"
      ]
    },
    "FrozenAt": null
  },

The advertisement can be found here:

https://s1.node.storacha.network/ipni/v1/ad/baguqeeradgcmpwo5ufjpd7qlakr4cqxqyvqqd276llglhx5lmvgk4lv7beuq

Then you need to resolve the Entries link:

https://s1.node.storacha.network/ipni/v1/ad/baguqeeramfxghwgswbneqz56thsgjwyekkr4liqrikd6wja6oq5c3vwtviia

{
  "Entries": [
    {
      "/": {
        "bytes": "EiDO0lRclII71qwYDzLuluWewpNRmnonxuNAZp1XxXi+oQ"
      }
    }
  ]
}

Finally, you need to build a CID from the multihash bytes.

I am using the following Node.js snippet, you can also use https://github.com/willscott/cid-utils.

import { CID } from 'multiformats/cid'
import * as multihash from 'multiformats/hashes/digest'

const entryHash = 'EiDO0lRclII71qwYDzLuluWewpNRmnonxuNAZp1XxXi+oQ'
const payloadCid = CID.create(1, 0x55 /* raw */, multihash.decode(Buffer.from(entryHash, 'base64'))).toString()
console.log(payloadCid)

Here is the final payload CID you can try to retrieve:

bafkreigo2jkfzfechplkygapglxjnzm6ykjvdgt2e7dogqdgtvl4k6f6ue

@bajtos
Copy link
Contributor Author

bajtos commented Dec 18, 2024

I'd also like to clarify that this issue is not a problem for Spark. In Spark, we implemented a custom IPNI resolution step, and we use Lassie to perform a Graphsync retrieval from the given provider multiaddr. We are not triggering the problem described in this issue.

@rvagg
Copy link
Member

rvagg commented Dec 19, 2024

I guess storacha isn't doing http retrievals, or they're doing something custom, because that metadata they set in ipni can't be decoded as one of the standard ipni metadata types. 🤷 I guess I'll just merge and tag and see if anyone says anything.

@rvagg
Copy link
Member

rvagg commented Dec 19, 2024

I am still puzzled why Spark is persisting with graphsync though, there's zero maintenance on it, it's a buggy protocol, we've been ripping out of tooling wherever we can and there was a whole lot of effort put in last year to embed bitswap into boost and then develop a whole new protocol (http) that's much better at doing this. It's not surprising that retrievability stats are so poor if we're measuring based on graphsync.

@bajtos
Copy link
Contributor Author

bajtos commented Dec 19, 2024

I am still puzzled why Spark is persisting with graphsync though, there's zero maintenance on it, it's a buggy protocol, we've been ripping out of tooling wherever we can and there was a whole lot of effort put in last year to embed bitswap into boost and then develop a whole new protocol (http) that's much better at doing this. It's not surprising that retrievability stats are so poor if we're measuring based on graphsync.

We see that ~2/3rds of successful retrievals happen over Graphsync.

It seems to me that the easiest way to configure Boost & Venus is to enable Graphsync, and so that's what most SPs do when they need to enable Spark to start measuring them.

Our plan is to drop Graphsync in Spark v2, together with other breaking changes we are planning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants