Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsigned integer underflow #1696

Open
mxyns opened this issue Oct 9, 2023 · 3 comments
Open

unsigned integer underflow #1696

mxyns opened this issue Oct 9, 2023 · 3 comments

Comments

@mxyns
Copy link

mxyns commented Oct 9, 2023

Hello,

While using nom I encountered an issue with this impl of Offset for [u8]

nom/src/traits.rs

Lines 599 to 606 in 8c68e22

impl Offset for [u8] {
fn offset(&self, second: &Self) -> usize {
let fst = self.as_ptr();
let snd = second.as_ptr();
snd as usize - fst as usize
}
}

The return value underflows when the first span is after the second in memory. This means the user always has to think about whether to use first.offset(second) or second.offset(first):

fn main() {

    let buf = [0,1,2,3,4,5,6,7,8,9];
    let mut first: LocatedSpan<&[u8]>= LocatedSpan::from(&buf[..]);
    let mut second = some_parser::read(first);

    let offset = first.offset(&second);
    println!("offset is negative: {}", offset);

    let offset = second.offset(&first);
    println!("offset is positive: {}", offset);
}

To me, an offset means it may be negative but maybe changing the return type of the offset method to isize may be problematic elsewhere, I don't know a lot about the internals of nom.

Maybe just doing an absolute value of the result could be enough? or let Offset have a signed version of the offset method?

@danielocfb
Copy link

danielocfb commented May 3, 2024

Seeing the same issue. Also, I am not sure if "user error" is the right connotation here. The trait seems pretty broken if you ask me, as it appears to make the assumption that self and second are related, but that is not enforced or even documented anywhere to the degree I can tell. Okay, perhaps one can guess that relationship by what it does. And yet, this functionality is used in higher level APIs where this contract seems lost entirely.

E.g., this program panics to due over/underflow:

use nom::error::ErrorKind::Tag;
use nom::error::VerboseErrorKind::Nom;
use nom::error::convert_error;
use nom::error::VerboseError;

fn main() {
        let input = [31, 139, 8, 8, 85, 135, 48, 102, 2, 255, 108, 105, 98];
        let err = VerboseError {
            errors: vec![(String::from_utf8_lossy(&input), Nom(Tag))],
        };
        let _x = convert_error(String::from_utf8_lossy(&input), err);
}

the problem isn't really obvious at all. And in a general setting, neither may the fix be (on the user's end).

danielocfb pushed a commit to libbpf/blazesym that referenced this issue May 3, 2024
We have gotten reports of a crash caused by numeric overflow when
attempting to parse a Breakpad file. The reason lies in the error
conversion functionality that nom provides. Specifically, the
nom::error::convert_error() function expects errors with a context that
derefs to an str slice. In order to fulfill that contract we convert
everything to Cow<str>.
However, the function also implicitly assumes that substrings belong to
the same input slice. That is not the case, and cannot be easily fixed,
because there is no sane way of "relocating" arbitrary byte slice after
a lossy string conversion. The requirement of derefing into a str while
*also* mapping to byte slices and are assumed to be subslices of each
other just seems plain broken.
This change imports a copy of the convert_error() function and fixes up
the broken bits.
The upstream issue touching on this problem is:
rust-bakery/nom#1696

Signed-off-by: Daniel Müller <[email protected]>
@Geal
Copy link
Collaborator

Geal commented May 5, 2024

@mxyns @danielocfb as you saw, Offset assumes that the argument is part of the same slice and that should be documented. I looked at the change in blazesym is why the lossy conversion step would be needed in the first place? Because convert_error was not available for &[u8] inputs?

@danielocfb
Copy link

I looked at the change in blazesym is why the lossy conversion step would be needed in the first place? Because convert_error was not available for &[u8] inputs?

Yes, that is the reason. The function assumes something that derefs into str. So I we need some kind of conversion first.

Btw., I accidentally commented what is not the most fitting issue, I think #1619 actually covers this very problem already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants