-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended practice for adding reserved characters? #44
Comments
IRIs vs. URLs is a red herring here. reserved characters in a URL are still reserved in an IRI; the decoding that happens is all reversible and idempotent. The issue here is one about whether you are manipulating the text of the URL itself (which is what URL has always been; twisted.python.url just had a bug where it would let you put in some invalid characters), or whether you're manipulating a value stored in a URL, which is not visible as part of the URL. In other words: does A quick sketch a two-type proposal:
I think I actually like this, perhaps with the caveat that this could be a I strongly dislike exposing a bunch of public APIs for encoding and decoding, since it's easy to screw that up, and you end up passing around context-free strings all the time, ending in the inevitable concatenation of some unquoted HTML, some quoted URL, some unquoted SQL, and etc. |
Yeah, agree. I am probably going to expose the encoding/decoding functions as a sorta hazmat utility suite, because it has come up for me to need just the encoding for a part of the URL. But, yes, at this point, I mostly envision a |
Please don't do this :-(. The minimal API is one of |
The functions don't go on the URL type? It's separating this out into a module. And they're worth exposing exactly because urllib's unqualified blanket approach leads to error-prone code. |
|
Everything in that region strikes me as a private implementation detail which I would not want exposed / supported; it's an end run around the integrity of the URL object. But it looks like we have broad consensus around |
So I'm cruising along over here and it's looking like the straightforward approach will work fine. Each field needs individual handling, but that's ok because it lets us support conveniences like #48. Also, in a greenfield environment I agree with @wsanchez that I wish |
Merry Christmas @glyph and @wsanchez, your DecodedURL is all but ready. It is somewhat tested and ready for review, here: #54 Along the way I went on a longish Unicode journey. Somewhat terse notes here: https://gist.github.com/mahmoud/7bc696254a738404bc281c270b169613 |
🎄 |
I believe this issue should now be closed? |
Yes, I plan on closing these issues when the code + docs are released. I've
come down with a bit of a bug myself, but I'll get around to it!
…On Wed, Jan 10, 2018 at 10:56 AM, Glyph ***@***.***> wrote:
I believe this issue should now be closed?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH8kacBt4dZ3IO3rB8JfYz_BKL38o69ks5tJQfBgaJpZM4QH9N5>
.
|
Ah, I miss Launchpad's distinction between fix committed and fix released :) |
18.0.0 released today! 🎉 |
Per @markrwilliams comment here and a few others dotted around, we're facing a design gap in hyperlink's APIs.
To paraphrase:
Yields:
This is due to a subtle shift in hyperlink's design compared to
twisted.python.url
.t.p.url
would allow any string value in, whereas hyperlink prefers to store the "minimally-encoded" version. This is why aValueError
is raised from the code above.Technically, this can be solved by making the code
url.add(u'param', _encode_query_part(u'#value'))
. But hyperlink's primary goal is to handle encoding/decoding, does it really make sense to push that back on the user?One solution Mark and I discussed would be to switch to decoding every value passed in. But what if someone were to pass in
u'%23%'
and actually intend for that to be their decoded value? And the API would be further complicated by the fact that the underlying decoding is generally unknown. UTF8, Latin-1, and plain old binary are all valid in percent-encoded URL parts. Autodecoding UTF8 might have better usability most of the time, but much like relying on Python 2's implicit encoding/decoding, the safety of the explicit_encode_*_part()
is probably preferable.It might occur to one that this entire problem bears some resemblance to the
bytes
/unicode
split, asURL
hasURL.to_uri()
andURL.to_iri()
. There is some truth to this, but both IRIs and URIs are both URLs. Having two types imposes a sort of artificial split I'd like to avoid if possible, but we also don't have a good way to represent an already decoded IRI. This was causing an issue with double decoding on multiple.to_iri()
calls (see #16).Right now my best idea is to enable that technical solution above by exposing the various encoding and decoding functions as public APIs, since those may prove useful utilities for other contexts anyways. I'm sure there are better ideas, too, so I'm going to leave this issue open as a place for discussion on handling this quandary.
The text was updated successfully, but these errors were encountered: