Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned bit arrays on the JavaScript target #3946

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

richard-viney
Copy link
Contributor

@richard-viney richard-viney commented Dec 3, 2024

Summary 📘

This PR adds support for unaligned bit arrays on the JavaScript target. Specifically:

In expressions:

  • Arbitrary sized integer segments:
    <<1:4>>
    <<12:9-little>>
    <<12:29-big>>
    <<1234:100-little>>
  • Arbitrary sized bits segments:
    <<<<0xABCD:15>>:bits-10>>

In patterns:

  • Arbitrary sized int segments:
    let assert <<_:7, i:19-little-signed>> = <<0xABCDEF12:26>>
  • Sized and unsized bits segments:
    let assert <<_:7, a:bits-3, b:size(14)-bits, c:bits>> = <<0xABCDEF:24, 0x1234:16>>

There is also a warning if the above features are used when gleam.toml specifies a version < v1.7.0.


Implementation Details 🛠️

  • The BitArray class in the prelude now has both bitSize and byteSize fields.
  • The value of any unused low bits in the final byte are undefined. They will be zero in many common use cases, but making them undefined allowed for additional optimisations on some operations.
  • The BitArray class in the prelude has been reworked in a few ways:
    • Public API for use in FFI code: get rawBuffer(), get bitSize(), get byteSize(), iterateBytes(), byteAt().
    • Deprecated API (used by existing FFI code): get buffer(), get length(), floatFromSlice(), intFromSlice(), binaryFromSlice(), sliceAfter().
    • Use of a deprecated API emits a warning at runtime.
    • JSDoc annotations have been added to all functions allowing type-checking by adding // @ts-check to the top of the file.
    • BitArray.sliceToInt() has internal variants for aligned and unaligned access, as well as variants for both number and BigInt. The number variant is used when the size is <= 53 bits.BigInt is typically 5-10x slower, hence the decision to support both paths.

Implications for @external JavaScript code 🌍

  • Existing JavaScript FFI code that operates on bit arrays needs to be updated. Until this is done such code will emit deprecation warnings at runtime due to use of deprecated APIs such as BitArray.length and BitArray.buffer.
  • If such code is called with an unaligned bit array it will round the bitSize up to a multiple of 8, and operate on the undefined low bits in the final byte, which will probably lead to the wrong output.
  • No existing code breaks because unaligned bit arrays on JavaScript weren't previously possible. Still, there could be code that is now valid on the JavaScript target, which wasn't valid previously, and which won't give the correct result for unaligned bit arrays.
  • I can make relevant updates to any affected packages to fix the deprecation warnings. Packages that only sensibly operate on whole bytes can error in the case of an unaligned bit array.

Implications for gleam/stdlib 🤝

  • I have the updates for gleam/stdlib ready to go, mostly affecting gleam/bit_array. It can only be merged once this PR goes in as its tests don't run on Gleam 1.6.3. It may be necessary to run the new stdlib tests on nightly for a short period, with them segregated into their own file so they can be included/excluded depending on the active Gleam version. I'll sort that out once this PR makes it through review.
  • Future stdlib versions that support unaligned bit arrays on JavaScript will work fine on Gleam versions < 1.7.0, there are no compatibility concerns there.
  • We could print a warning if unaligned bit arrays are used on JavaScript and the package's stdlib version doesn't support them. Should we do this? If so, I'd prefer to implement it in a follow-up PR if that's ok.

Testing 🧪

There's certainly some complexity and tricky bitwise operations here, mostly in the JavaScript prelude. The following has been done to ensure correctness:

  • Many new tests added to language_tests.gleam, and test/javascript_prelude.
  • Every path and branch through BitArray.slice(), BitArray.sliceToInt(), BitArray.sliceToFloat() is covered by at least one test.
  • Extensive fuzzing has been performed on bit array construction, slicing, and slicing to ints and floats.
    • This validated millions of combinations of bit array contents, segment sizes, offsets, endianness, signedness, etc. on JavaScript against the result on the Erlang target.
    • Issues found by this testing were fixed and added to the language tests and prelude tests.

Limitations 🤔

The main limitation is that there is no allowance for unused high bits in the first byte of a bit array's buffer.

The motivation for allowing this would be to make bit array slices O(1) in all cases. Currently a slice is O(1) only if its start offset is byte-aligned (the end offset doesn't matter). If the start offset isn't byte-aligned then a slice is O(N) due to requiring a copy.

This makes the following O(N²) on JavaScript, but O(N) on Erlang:

pub fn print_bits(bits: BitArray) -> Nil {
  case bits {
    <<b:1, rest:bits>> -> {
      b |> int.to_string |> io.print
      print_bits(rest)
    }
    _ -> io.println("")
  }
}

This could be addressed at a later date, albeit with another round of impact on JavaScript FFI code that would need updating. So maybe it's better to bite the bullet now? Or maybe it's not important enough to warrant the additional complexity. There's also a reasonably good chance that any folks affected by this would be able to rework their code to avoid the performance issue (if they realise what the problem is).


✨✨✨

@richard-viney richard-viney marked this pull request as ready for review December 3, 2024 12:22
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from 2496b81 to bb69dd9 Compare December 5, 2024 01:45
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch 2 times, most recently from c5b24c8 to b347126 Compare December 13, 2024 08:05
@richard-viney richard-viney force-pushed the js-unaligned-bit-arrays branch from b347126 to 9aa5446 Compare December 13, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant