-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safekeeper: optimize WAL decoding #10097
Comments
WAL record CRC32 verification takes 18% of CPU time: neon/libs/postgres_ffi/src/waldecoder_handler.rs Lines 227 to 228 in a4397d4
This already uses a hardware-accelerated SIMD implementation (SSE 4.2). However, we only get parallelism with chunks of 3x8192 bytes or 3x256 bytes. We're making two calls here: the 20-byte header (not parallelized) and the actual WAL record. We can try to append or parallelize these. Furthermore, only about 2% of this happens when receiving WAL, while 16% happens when sending -- we're likely sending across an 8-shard tenant in this case (ingest benchmark), so we're verifying the same record 8 times. This will be addressed by #9337. |
Unsurprisingly, copying the data into a combined slice and running crc32 on that ends up being more expensive except for trivially small records:
Another option might be to just do CRC32 on the entire |
In We decided to punt this until we've implemented the cursor in #9337. We should make sure we do this then. |
After moving WAL decoding to the Safekeeper side in #9746, we're seeing Safekeeper CPUs pinned at 100%, effectively bottlenecked on WAL decoding. CPU profiles have shown this to primarily be checksums/hashing and (de)serialization costs. We should try to optimize and/or parallelize this.
The text was updated successfully, but these errors were encountered: