Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path base Implementing #534

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open

Path base Implementing #534

wants to merge 41 commits into from

Conversation

huyngopt1994
Copy link
Collaborator

@huyngopt1994 huyngopt1994 commented Aug 19, 2024

The PR include sub PRs for Path base Implementing

@huyngopt1994 huyngopt1994 force-pushed the path-base-implementing branch 2 times, most recently from 3cb37d6 to ca1f046 Compare September 17, 2024 05:01
@Francesco4203 Francesco4203 force-pushed the path-base-implementing branch from 88c7ca4 to b78c647 Compare October 17, 2024 08:06
@huyngopt1994 huyngopt1994 changed the title [WIP] Path base Implementing Path base Implementing Oct 18, 2024
@huyngopt1994 huyngopt1994 force-pushed the path-base-implementing branch from 69b0c7f to e42b5b8 Compare October 25, 2024 09:40
huyngopt1994 and others added 24 commits November 21, 2024 11:35
* core,trie,eth,cmd: rework preimage store

* ci: trigger unittest path-base-implementing
…liary tool to capture all deleted node wwhich can't be captured by trie.Committer. The deleted nodes (#552)

can be removed from the disk later. Implement traverse and rework init Trie
* cmd, core/state, light, trie, eth: add trie owner notion

* all: refactor

* tests: fix goimports

* core/state/snapshot: fix ineffasigns

Co-authored-by: rjl493456442 <[email protected]>
Co-authored-by: Martin Holst Swende <[email protected]>
* core, trie: rework trie commiter

changed the commit procedure, introduce new struct called nodeSet for
returning including all dirty nodes of a trie. Multiple nodeset will be
merged to MergedNodeSet struct. then be submitted to in-memory database
from block to block

* trie,core: fix comments
* core: store genesis allocation and recommit them if necessary (#24460)

* core: store genesis allocation and recommit them if necessary

* core: recover predefined genesis allocation if possible

* all: cleanup the APIs for initializing genesis (#25473)

* all: polish tests

* core: apply feedback from Guillaume

* core: fix comment

---------

Co-authored-by: rjl493456442 <[email protected]>
core, eth, les, trie: rework snap sync

Co-authored-by: rjl493456442 <[email protected]>
… ethdb, can be used independently of the chain database, reference by commit 1941c5e (#571)
* cmd, core, ethdb, node: rework ancient store folder reference by ethereum/go-ethereum@e44d655
* all: move genesis initialization to blockchain

* all: fix test
* core: add blockchain test for failing create/destroy-case

* core,state: some refactors

* core/rawdb: refactor db inspector for extending multiple ancient store
* core,eth,tests,trie: abstract node scheme, and contruct database
interface instead of keyvalue for supporting storing diff reverse data
in ancient

* stacktrie,core,eth: port the changes in stacktries, track the path prefix of nodes when commits,  use ethdb.Database for constructing trie.Database, it's not necessary right now, but it's required for path-based used to open reverse diff freezer

* core,trie: add scheme and resolvepath logic
* trie: track deleted nodes

* core: track deleted nodes
* all: prep for path-based trie storage

* all: use rawdb.HasLegacyNode() to check for node existance instead of check for length
* trie: implement NodeBlob API for trie iterator

This functionality is needed in new path-based storage scheme, but
can be implemented in a seperate PR though.

When an account is deleted, then all the storage slots should be
nuked out from the disk as well. In hash-based storage scheme they
are still left in the disk but in new scheme, they will be iterated
and marked as deleted.

But why the NodeBlob API is needed in this scenario? Because when
the node is marked deleted, the previous value is also required to
be recorded to construct the reverse diff.

* fuzzers/stacktrie: enable test

---------

Co-authored-by: Gary Rong <[email protected]>
* trie: refactor tracer

* fix: add description
* trie: add wrapper for database

* trie: refactor trie node

* all: fix test

* rawdb, trie: fix comment

trie: change name WithPrev => NodeWithPrev
rawdb: add schema_test
* trie: triestate/Set to track changes

* core/state: track state changes

journal.go: in resetObjectChange
- add account in resetObjectChange (ref ethereum/go-ethereum#27339)
- add prevAccount and prevStorage (ref ethereum/go-ethereum#27376)
- add prevAccountOrigin and prevStorageOrigin to track changes
state_object.go: add origin for tracking the original StateAccount before change
statedb.go:
- add accountsOrigin and storagesOrigin, same functions as above
- stateObjectsDestruct now track the previous state before destruct
- add functions for handle destructing old states

* all: apply changes to tests
* core/state: clean up: db already exist in stateObject

* core, trie: statedb also commit the block number
* all: clean up overall structure, preparing for path-based (#594)

* trie/triedb/pathdb: init pathdb components

* core, trie: track state change with address instead of hash

Reference: ethereum/go-ethereum@817553c

* trie: refactor

* rawdb: implement freezer resettable & state freezer  (#596)

* rawdb: implement freezer resettable

* rawdb: implement state freezer

* rawdb: update description

* trie: path based scheme implementing (#598)

* core/state: move account definition to core/types

Reference: ethereum/go-ethereum#27323

* trie: add path base utils

* triedb: implement history and adding some test utils

* trie/triedb/pathdb: implement difflayer and disklayer

* Fix some issues related to history, and add logic checking maxbyte when is zero for retrieving ancient ranges with maxbyte is zero

* trie/triedb/pathdb: implement database.go

* freezer: Add unit test and docs for support freezer reading with no limit size

* trie/triedb/pathdb: add database and difflayer tests

* triedb/pathdb: implement journal and add more comments

---------

Co-authored-by: Huy Ngo <[email protected]>

---------

Co-authored-by: Francesco4203 <[email protected]>
huyngopt1994 and others added 17 commits November 21, 2024 11:35
* trie: enable pathdb: add path config and enable tests

* core/rawdb: now also inspect the state freezer in pathdb; rename

* cmd: working on cmd ronin

* core: refactor; add pathbase config; fix tests

- all: fix and enable tests for pathbase
- blockchain: open triedb explicitly in blockchain functions and close right after use, since diskLayer inside pathdb is a skeleton
- blockchain: when writeBlockWithState, pathbase will skip the explicit garbage collector, which is only needed for hashbase
- genesis.go: nit: change check genesis state, ref ethereum/go-ethereum@08bf8a6

* tests: enable path tests

* eth: enable path scheme

- all: fix tests, enable path scheme tests
- state_accessor: split function to retrieve statedb from block to hash scheme and path scheme

* light, miner, les, ethclient: clean up tests

* trie: refactor triereader, return err when state reader won't be created in hash and path

* trie: fix failed test in iterator and sync test tie

* trie,core: improve trie reader and add checking config nil when initing
database

* trie: statedb instance is committed, then it's not usable, a new instance must be created based on new root updated database, reference by commit 6d2aeb4

* cmd,les,eth: fixed unittest and adding flag Parrallel correctly

* core, eth: fix tests

* core: refactor and fix sync_test logic

* tmp: disable pathbase for TestIsPeriodBlock, TestIsTrippEffective

---------

Co-authored-by: Huy Ngo <[email protected]>
…me object when passing parents in cosortium v1 (#608)

* cmd,eth: fix wrong compare logic when data dir is empty and moving checking error correctly

* docker: passing state.scheme when initing the genesis data

* rawdb: add missing freezer in collections

* v1/consortium:  create a copy to keep parents content

In snapshot function, the list parents is popped out gradually for getting its contents, so when calling apply, the parents list is empty. Simply create a copy at the beginning to fix it.
This has been fixed in consortium v2. For a full sync scenario, however, the first blocks are still processed with consortium v1, which causes our node to panic.

---------

Co-authored-by: Francesco4203 <[email protected]>
* eth/downloader: prevent pivot moves after state commit (#28126)

* core, eth/downloader: fix genesis state missing due to state sync (#28124)

* core: fix chain repair corner case in path-based scheme

* eth/downloader: disable trie database whenever state sync is launched

---------

Co-authored-by: Péter Szilágyi <[email protected]>
Co-authored-by: rjl493456442 <[email protected]>
commit ethereum/go-ethereum@65ed1a6.

This change speeds up trie hashing and all other activities that require
RLP encoding of trie nodes by approximately 20%. The speedup is achieved by
avoiding reflection overhead during node encoding.

The interface type trie.node now contains a method 'encode' that works with
rlp.EncoderBuffer. Management of EncoderBuffers is left to calling code.
trie.hasher, which is pooled to avoid allocations, now maintains an
EncoderBuffer. This means memory resources related to trie node encoding
are tied to the hasher pool.

This also refactors some functions in rlp package.

goos: linux
goarch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op     vs base                │
DeriveSha200/std_trie-8     725.1µ ± 31%   613.8µ ± 37%        ~ (p=0.481 n=10)
DeriveSha200/stack_trie-8   572.3µ ± 10%   493.1µ ± 13%  -13.85% (p=0.005 n=10)
geomean                     644.2µ         550.1µ        -14.61%

                          │   old.txt    │               new.txt                │
                          │     B/op     │     B/op      vs base                │
DeriveSha200/std_trie-8     287.4Ki ± 0%   283.0Ki ± 0%   -1.53% (p=0.000 n=10)
DeriveSha200/stack_trie-8   56.34Ki ± 0%   42.43Ki ± 0%  -24.69% (p=0.000 n=10)
geomean                     127.2Ki        109.6Ki       -13.88%

                          │   old.txt   │               new.txt               │
                          │  allocs/op  │  allocs/op   vs base                │
DeriveSha200/std_trie-8     2.931k ± 0%   2.917k ± 0%   -0.46% (p=0.000 n=10)
DeriveSha200/stack_trie-8   1.462k ± 0%   1.246k ± 0%  -14.77% (p=0.000 n=10)
geomean                     2.070k        1.907k        -7.90%

                         │   old.txt    │               new.txt                │
                         │    sec/op    │    sec/op     vs base                │
Prove-8                    664.0µ ± 21%   450.2µ ± 27%  -32.20% (p=0.000 n=10)
VerifyProof-8              8.643µ ± 18%   9.009µ ± 33%        ~ (p=0.684 n=10)
VerifyRangeProof10-8       99.18µ ± 25%   67.60µ ± 67%        ~ (p=0.089 n=10)
VerifyRangeProof100-8      496.3µ ± 20%   487.0µ ± 33%        ~ (p=0.739 n=10)
VerifyRangeProof1000-8     5.149m ± 32%   4.095m ± 49%        ~ (p=0.971 n=10)
VerifyRangeProof5000-8     19.79m ± 60%   19.16m ± 28%        ~ (p=0.631 n=10)
VerifyRangeNoProof10-8     499.0µ ± 15%   422.8µ ± 29%  -15.25% (p=0.035 n=10)
VerifyRangeNoProof500-8    1.747m ± 30%   1.417m ± 24%  -18.91% (p=0.023 n=10)
VerifyRangeNoProof1000-8   3.025m ± 29%   2.239m ± 33%  -25.98% (p=0.009 n=10)
geomean                    750.9µ         622.6µ        -17.09%

                     │    old.txt    │               new.txt                │
                     │    sec/op     │    sec/op     vs base                │
HashFixedSize/10-8      60.30µ ± 19%   44.84µ ± 17%  -25.64% (p=0.000 n=10)
HashFixedSize/100-8     205.9µ ± 32%   145.2µ ± 19%  -29.48% (p=0.000 n=10)
HashFixedSize/1K-8     1326.5µ ± 23%   939.2µ ± 25%  -29.20% (p=0.002 n=10)
HashFixedSize/10K-8     14.77m ± 25%   12.74m ± 19%        ~ (p=0.075 n=10)
HashFixedSize/100K-8    135.2m ± 19%   104.1m ± 18%  -23.03% (p=0.003 n=10)
geomean                 2.011m         1.520m        -24.43%

                     │    old.txt    │               new.txt                │
                     │     B/op      │     B/op      vs base                │
HashFixedSize/10-8     11.729Ki ± 0%   9.752Ki ± 0%  -16.85% (p=0.000 n=10)
HashFixedSize/100-8     58.56Ki ± 0%   49.23Ki ± 0%  -15.93% (p=0.000 n=10)
HashFixedSize/1K-8      578.1Ki ± 0%   481.5Ki ± 0%  -16.72% (p=0.000 n=10)
HashFixedSize/10K-8     6.019Mi ± 0%   4.985Mi ± 0%  -17.18% (p=0.000 n=10)
HashFixedSize/100K-8    59.53Mi ± 0%   49.29Mi ± 0%  -17.20% (p=0.000 n=10)
geomean                 683.5Ki        568.8Ki       -16.78%

                     │   old.txt   │              new.txt               │
                     │  allocs/op  │  allocs/op   vs base               │
HashFixedSize/10-8      149.0 ± 0%    142.0 ± 0%  -4.70% (p=0.000 n=10)
HashFixedSize/100-8     772.0 ± 0%    739.0 ± 0%  -4.27% (p=0.000 n=10)
HashFixedSize/1K-8     7.443k ± 0%   7.099k ± 0%  -4.62% (p=0.000 n=10)
HashFixedSize/10K-8    77.09k ± 0%   73.32k ± 0%  -4.89% (p=0.000 n=10)
HashFixedSize/100K-8   767.8k ± 0%   730.5k ± 0%  -4.86% (p=0.000 n=10)
geomean                8.729k        8.321k       -4.67%

Co-authored-by: Qian Bin <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
* core, accounts, eth, trie: handle genesis state missing (#28171)

* core, accounts, eth, trie: handle genesis state missing

* core, eth, trie: polish

* core: manage txpool subscription in mainpool

* eth/backend: fix test

* cmd, eth: fix test

* core/rawdb, trie/triedb/pathdb: address comments

* eth, trie: address comments

* eth: inline the function

* eth: use synced flag

* core/txpool: revert changes in txpool

* core, eth, trie: rename functions

* trie: remove internal nodes between shortNode and child in path mode (#28163)

* trie: remove internal nodes between shortNode and child in path mode

* trie: address comments

* core/rawdb, trie: address comments

* core/rawdb: delete unused func

* trie: change comments

* trie: add missing tests

* trie: fix lint

---------

Co-authored-by: rjl493456442 <[email protected]>
…s in path scheme state management) (#619)

* trie/triedb/pathdb, core/rawdb: enhance error message in freezer (#28198)

This PR adds more error message for debugging purpose.

* trie/triedb/pathdb: improve dirty node flushing trigger (#28426)

* trie/triedb/pathdb: improve dirty node flushing trigger

* trie/triedb/pathdb: add tests

* trie/triedb/pathdb: address comment

* core/rawdb: fsync the index file after each freezer write (#28483)

* core/rawdb: fsync the index and data file after each freezer write

* core/rawdb: fsync the data file in freezer after write

---------

Co-authored-by: rjl493456442 <[email protected]>
…tium. (#624)

In snap sync, we will disable accessing/mark stale to triedb when enabling path scheme for protecting the persistent storing, so the data of validators only used for checking in some first blocks which we can return hardcore list from genesis data for following the flow of snap-sync  from go-eth team.
* trie: refactor stacktrie (#28233)

This change refactors stacktrie to separate the stacktrie itself from the
internal representation of nodes: a stacktrie is not a recursive structure
of stacktries, rather, a framework for representing and operating upon a set of nodes.

---------

Co-authored-by: Gary Rong <[email protected]>

* trie: remove owner and binary marshaling from stacktrie (#28291)

This change
  - Removes the owner-notion from a stacktrie; the owner is only ever needed for comitting to the database, but the commit-function, the `writeFn` is provided by the caller, so the caller can just set the owner into the `writeFn` instead of having it passed through the stacktrie.
  - Removes the `encoding.BinaryMarshaler`/`encoding.BinaryUnmarshaler` interface from stacktrie. We're not using it, and it is doubtful whether anyone downstream is either.

* core, trie, eth: refactor stacktrie constructor

This change enhances the stacktrie constructor by introducing an option struct. It also simplifies the `Hash` and `Commit` operations, getting rid of the special handling round root node.

* core, eth, trie: filter out boundary nodes and remove dangling nodes in stacktrie (#28327)

* core, eth, trie: filter out boundary nodes in stacktrie

* eth/protocol/snap: add comments

* Update trie/stacktrie.go

Co-authored-by: Martin Holst Swende <[email protected]>

* eth, trie: remove onBoundary callback

* eth/protocols/snap: keep complete boundary nodes

* eth/protocols/snap: skip healing if the storage trie is already complete

* eth, trie: add more metrics

* eth, trie: address comment

---------

Co-authored-by: Martin Holst Swende <[email protected]>

---------

Co-authored-by: Martin Holst Swende <[email protected]>
Co-authored-by: Gary Rong <[email protected]>
…626)

* core/rawdb: improve state scheme checking (#28724)

This pull request improves the condition to check if path state scheme is in use. 

Originally, root node presence was used as the indicator if path scheme is used or not. However due to fact that root node will be deleted during the initial snap sync, this condition is no longer useful.

If PersistentStateID is present, it shows that we've already configured for path scheme.

* core, triedb/pathdb: calculate the size for batch pre-allocation (#29106)

* core, triedb/pathdb: calculate the size for batch pre-allocation

* triedb/pathdb: address comment

* triedb/pathdb: fix panic in recoverable (#29107)

* triedb/pathdb: fix panic in recoverable

* triedb/pathdb: add todo

* triedb/pathdb: rename

* triedb/pathdb: rename

---------

Co-authored-by: rjl493456442 <[email protected]>
* trie: remove inconsistent trie nodes during sync in path mode (#28595)

This fixes a database corruption issue that could occur during state healing.
When sync is aborted while certain modifications were already committed, and a reorg occurs, the database would contain incorrect trie nodes stored by path.
These nodes need to detected/deleted in order to obtain a complete and fully correct state after state healing.

---------

Co-authored-by: Felix Lange <[email protected]>

* core, cmd, trie: fix the condition of pathdb initialization (#28718)

Original problem was caused by #28595, where we made it so that as soon as we start to sync, the root of the disk layer is deleted. That is not wrong per se, but another part of the code uses the "presence of the root" as an init-check for the pathdb. And, since the init-check now failed, the code tried to re-initialize it which failed since a sync was already ongoing.

The total impact being: after a state-sync has begun, if the node for some reason is is shut down, it will refuse to start up again, with the error message: `Fatal: Failed to register the Ethereum service: waiting for sync.`.

This change also modifies how `geth removedb` works, so that the user is prompted for two things: `state data` and `ancient chain`. The former includes both the chaindb aswell as any state history stored in ancients.

---------

Co-authored-by: Martin HS <[email protected]>

---------

Co-authored-by: rjl493456442 <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
Co-authored-by: Martin HS <[email protected]>
* v2/consortium_test: only insert newly created blocks

* consortium_test.go: add comments
@huyngopt1994 huyngopt1994 force-pushed the path-base-implementing branch from 4187e39 to 190f2ff Compare November 21, 2024 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants