Unaligned bit arrays on the JavaScript target #3946

richard-viney · 2024-12-03T11:54:30Z

Summary 📘

This PR adds support for unaligned bit arrays on the JavaScript target. Specifically:

In expressions:

Arbitrary sized integer segments:

<<1:4>>
<<12:9-little>>
<<12:29-big>>
<<1234:100-little>>

Arbitrary sized bits segments:
```
<<<<0xABCD:15>>:bits-10>>
```

In patterns:

Arbitrary sized int segments:

let assert <<_:7, i:19-little-signed>> = <<0xABCDEF12:26>>

Sized and unsized bits segments:

let assert <<_:7, a:bits-3, b:size(14)-bits, c:bits>> = <<0xABCDEF:24, 0x1234:16>>

There is also a warning if the above features are used when gleam.toml specifies a version < v1.7.0.

Implementation Details 🛠️

The BitArray class in the prelude now has both bitSize and byteSize fields.
The value of any unused low bits in the final byte are undefined. They will be zero in many common use cases, but making them undefined allowed for additional optimisations on some operations.
The BitArray class in the prelude has been reworked in a few ways:
- Public API for use in FFI code: get rawBuffer(), get bitSize(), get byteSize(), iterateBytes(), byteAt().
- Deprecated API (used by existing FFI code): get buffer(), get length(), floatFromSlice(), intFromSlice(), binaryFromSlice(), sliceAfter().
- Use of a deprecated API emits a warning at runtime.
- JSDoc annotations have been added to all functions allowing type-checking by adding // @ts-check to the top of the file.
- BitArray.sliceToInt() has internal variants for aligned and unaligned access, as well as variants for both number and BigInt. The number variant is used when the size is <= 53 bits.BigInt is typically 5-10x slower, hence the decision to support both paths.

Implications for `@external` JavaScript code 🌍

Existing JavaScript FFI code that operates on bit arrays needs to be updated. Until this is done such code will emit deprecation warnings at runtime due to use of deprecated APIs such as BitArray.length and BitArray.buffer.
If such code is called with an unaligned bit array it will round the bitSize up to a multiple of 8, and operate on the undefined low bits in the final byte, which will probably lead to the wrong output.
No existing code breaks because unaligned bit arrays on JavaScript weren't previously possible. Still, there could be code that is now valid on the JavaScript target, which wasn't valid previously, and which won't give the correct result for unaligned bit arrays.
I can make relevant updates to any affected packages to fix the deprecation warnings. Packages that only sensibly operate on whole bytes can error in the case of an unaligned bit array.

Implications for `gleam/stdlib` 🤝

I have the updates for gleam/stdlib ready to go, mostly affecting gleam/bit_array. It can only be merged once this PR goes in as its tests don't run on Gleam 1.6.3. It may be necessary to run the new stdlib tests on nightly for a short period, with them segregated into their own file so they can be included/excluded depending on the active Gleam version. I'll sort that out once this PR makes it through review.
Future stdlib versions that support unaligned bit arrays on JavaScript will work fine on Gleam versions < 1.7.0, there are no compatibility concerns there.
We could print a warning if unaligned bit arrays are used on JavaScript and the package's stdlib version doesn't support them. Should we do this? If so, I'd prefer to implement it in a follow-up PR if that's ok.

Testing 🧪

There's certainly some complexity and tricky bitwise operations here, mostly in the JavaScript prelude. The following has been done to ensure correctness:

Many new tests added to language_tests.gleam, and test/javascript_prelude.
Every path and branch through BitArray.slice(), BitArray.sliceToInt(), BitArray.sliceToFloat() is covered by at least one test.
Extensive fuzzing has been performed on bit array construction, slicing, and slicing to ints and floats.
- This validated millions of combinations of bit array contents, segment sizes, offsets, endianness, signedness, etc. on JavaScript against the result on the Erlang target.
- Issues found by this testing were fixed and added to the language tests and prelude tests.

Limitations 🤔

The main limitation is that there is no allowance for unused high bits in the first byte of a bit array's buffer.

The motivation for allowing this would be to make bit array slices O(1) in all cases. Currently a slice is O(1) only if its start offset is byte-aligned (the end offset doesn't matter). If the start offset isn't byte-aligned then a slice is O(N) due to requiring a copy.

This makes the following O(N²) on JavaScript, but O(N) on Erlang:

pub fn print_bits(bits: BitArray) -> Nil {
  case bits {
    <<b:1, rest:bits>> -> {
      b |> int.to_string |> io.print
      print_bits(rest)
    }
    _ -> io.println("")
  }
}

This could be addressed at a later date, albeit with another round of impact on JavaScript FFI code that would need updating. So maybe it's better to bite the bullet now? Or maybe it's not important enough to warrant the additional complexity. There's also a reasonably good chance that any folks affected by this would be able to rework their code to avoid the performance issue (if they realise what the problem is).

✨✨✨

richard-viney marked this pull request as ready for review December 3, 2024 12:22

richard-viney force-pushed the js-unaligned-bit-arrays branch 3 times, most recently from 2496b81 to bb69dd9 Compare December 5, 2024 01:45

richard-viney mentioned this pull request Dec 5, 2024

Unaligned bit arrays on the JavaScript target gleam-lang/stdlib#761

Draft

richard-viney force-pushed the js-unaligned-bit-arrays branch 2 times, most recently from c5b24c8 to b347126 Compare December 13, 2024 08:05

Unaligned bit arrays on the JavaScript target

9aa5446

richard-viney force-pushed the js-unaligned-bit-arrays branch from b347126 to 9aa5446 Compare December 13, 2024 09:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unaligned bit arrays on the JavaScript target #3946

Unaligned bit arrays on the JavaScript target #3946

richard-viney commented Dec 3, 2024 •

edited

Loading

Unaligned bit arrays on the JavaScript target #3946

Are you sure you want to change the base?

Unaligned bit arrays on the JavaScript target #3946

Conversation

richard-viney commented Dec 3, 2024 • edited Loading

Summary 📘

Implementation Details 🛠️

Implications for @external JavaScript code 🌍

Implications for gleam/stdlib 🤝

Testing 🧪

Limitations 🤔

richard-viney commented Dec 3, 2024 •

edited

Loading

Implications for `@external` JavaScript code 🌍

Implications for `gleam/stdlib` 🤝