Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_put_snapshot_local_filesystem seems to be flaky #2428

Open
tillrohrmann opened this issue Dec 16, 2024 · 4 comments
Open

test_put_snapshot_local_filesystem seems to be flaky #2428

tillrohrmann opened this issue Dec 16, 2024 · 4 comments
Assignees

Comments

@tillrohrmann
Copy link
Contributor

tillrohrmann commented Dec 16, 2024

The test test_put_snapshot_local_filesystem seems to be flaky: https://github.com/restatedev/restate/actions/runs/12356871562/job/34483774350#step:11:459.

It failed with

──── STDOUT:             restate-worker partition::snapshots::repository::tests::test_put_snapshot_local_filesystem

running 1 test
test partition::snapshots::repository::tests::test_put_snapshot_local_filesystem ... 2024-12-16T16:26:59.261752Z ERROR restate_worker::partition::snapshots::repository: error=Downloaded snapshot file Path { raw: "tmp/.tmpeWowbT/0/lsn_00000001734366419253-snap_14MqSnfDPgswhfdOH5MrdiF/data.sst" } has unexpected size: expected 13, got 0
FAILED

failures:

failures:
    partition::snapshots::repository::tests::test_put_snapshot_local_filesystem

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 64 filtered out; finished in 0.03s

cc @pcholakov

@tillrohrmann tillrohrmann changed the title test_put_snapshot_local_filesystem seems to be flakey test_put_snapshot_local_filesystem seems to be flaky Dec 16, 2024
@pcholakov pcholakov self-assigned this Dec 17, 2024
@pcholakov
Copy link
Contributor

Interesting, I'd seen this during development but I thought it was just a bug that I had fixed. Looking into it.

@pcholakov
Copy link
Contributor

Have we seen more occurrences of this, @tillrohrmann? I did a quick scan of recent GHA failures and didn't find any but it is hard to filter for. I ran 1000 iterations of the test on macOS and Ubuntu/x64 and got zero failures.

I added a simple check to verify the size post-upload; I'd like to also add explicit checksums to the metadata for added paranoia at some point soon.

@tillrohrmann
Copy link
Contributor Author

I've seen this test failure occur for the first time.

@pcholakov
Copy link
Contributor

Not 100% related but at least it might catch whether the issue happens on put, or on get - I created a draft PR that outlines what I was thinking with snapshot SST file checksums: #2436.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants