-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hypothesis: builtins.UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte #153
Comments
It would be helpful to catch this error and print the URL that produced it, so one might see what data is tripping us up. |
Here are some failing examples: error-causing bytes: b'\x80'
URL: URL.from_text('http://0.0/%80') error-causing bytes: b'\xe1\x8c\x84\xc3\xa9\xf1\xb1\xa9\x9d\x9b'
URL: URL.from_text('https://ɓ.ő𣫫á:26/ጄé\U00071a5d%9b') error-causing bytes: b'\xe1\x8c\x84\xc3\xa9\xf1\xb1\xa9\x9d\x9b0'
URL: URL.from_text('https://𐎹pɓ.ő𣫫á:51159/ጄé\U00071a5d%9b0/E7*\x13𐬃\x94\x8e') error-causing bytes: b'\xe1\x8c\x84\xc3\xa9\xf1\xb1\xa9\x9d\x9b0'
URL: URL.from_text('https://𐎹p1ɜ10貭.в.𢙑dɓ.ő𣫫á:51159/ጄé\U00071a5d%9b0/E7*\x13\U0004216a\x9d𠤈\x94\x8e') |
…which one can reproduce in the REPL: >>> from hyperlink import EncodedURL, DecodedURL
>>> encodedURL = EncodedURL.from_text('http://0.0/%80')
>>> encodedURL
URL.from_text('http://0.0/%80')
>>> DecodedURL(encodedURL)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 2046, in __init__
self.host, self.userinfo, self.path, self.query, self.fragment
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 2177, in path
[
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
_percent_decode(p, raise_subencoding_exc=True)
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 766, in _percent_decode
return unquoted_bytes.decode(subencoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte >>> encodedURL = EncodedURL.from_text('https://ɓ.ő𣫫á:26/ጄé\U00071a5d%9b')
>>> encodedURL
URL.from_text('https://ɓ.ő𣫫á:26/ጄé\U00071a5d%9b')
>>> DecodedURL(encodedURL)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 2046, in __init__
self.host, self.userinfo, self.path, self.query, self.fragment
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 2177, in path
[
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
_percent_decode(p, raise_subencoding_exc=True)
File "/Users/wsanchez/Dropbox/Developer/Twisted/klein/.tox/coverage-py38-twcurrent/lib/python3.8/site-packages/hyperlink/_url.py", line 766, in _percent_decode
return unquoted_bytes.decode(subencoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9b in position 9: invalid start byte |
@wsanchez Yes. |
I think DecodedURL maybe has a bit of leeway with a URL like this to mangle it or make it not completely round-trip-able through every API. Browsers have to cope with this kind of a mess, and they definitely do some mangling. For example, if you try pasting |
If you were to manipulate a busted URL like this, or manually create a copy via moving strings with DecodedURL, you'd get |
The Hypothesis strategies now shipping with Hyperlink are producing this error occasionally in Klein:
The text was updated successfully, but these errors were encountered: