-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash common fields for message propagation #792
Comments
@Stebalien what is your take on question No. 4? Do you think it's better to separate the topic or use the same one with a message ID function that's smart enough? |
Summary of discussions in standup:
|
IMO, use a separate topic. Having to decode messages in the message ID function is, IMO, a bad idea in general. But I don't feel strongly about this.
Don't we need quality information to support the ECChains? Otherwise, an attacker can just spam EC Chains and force everyone to store every possible EC chain because they'll have no idea which ones are actually useful. That is, I think what we need is an extra phase after quality. |
We can limit it and discard chains aggressively after quality. |
I don't really see how to do this securely (e.g., a flood can easily mask all real values). Especially before/during quality. Are we relying on some form of continuous rebroadcast? |
You are correct that it is tricky, but I think it is possible with code that is defensive enough.
|
How would we know if a message is valid at all. Not knowing the chain when GMessage arrives opens up another attack vector about invalid messages that cannot be validated fully until hashes arrive. The adversary can also immediately publish both gmessage with the hashes and the value associated to it across both topics. |
An alternative design:
This way the spam attack verctor would disappear (i.e. won't be any worse than it is today). The catch, of course is: larger messages in the QUALITY phase. This approach in terms of bandwidth consumption would certainly be far better than what we have today. Now, the question would become: would the QUALITY phase in this scenario be efficient enough to not put us back at square one considering the sensitivity of gpbft progress to it? I think yes (but we can test this). Because:
|
Addressing just this:
If the ECChain hash is the same as used in the signing process, we can check the signature. |
From @masih: We have data on how quality has progressed. Was it sooner than delta, then the quorum was reached. |
The investigation into Quality and Prepare messages revealed that we could not reliably propagate quorum of large Quality messages within 3min. |
Here is an example query to use in terms of checking to what extent propagation of QUALITY messages was an issue:
The issue we have with the data is that only network 44 progressed at a reasonable velocity past bootstrap and left to run at scale for a day. In that network the delta was sufficiently large to dilute the answer to the question being asked. So we are back at investigating how to combat the spam attack vector in the 2-topic design. |
Summary of discussions from standup:
|
@Stebalien could you read though the comment above and let us know what you think please ? |
I'd like to tackle all the spam vectors we currently have, not add more. But as long as we have rebroadcast, this should work.
Why?
I'm confused. Either we know we need it or we don't know we need it. If we need it, we shouldn't put it into some space limited cache, we should save it permanently. This permanent storage will be limited to 100 tipsets per valid participant. Honestly, I'm convinced we should do one of:
|
Direct responses, I will think about your proposed options.
We need to resolve ECChains within the quality round otherwise everyone could PREPARE on base.
We expect very few ECChains per instance, which protects against an adversary who is a sybil-ed participant. The upper size of the cache could be smaller than 100N. However, participants can also get swayed, so it would probably have to be 100 N* Rounds* Phases if we had it on a per participant basis. |
After quality, before prepare.
It protects against nothing because we have ho idea which ECChains are valid and which are invalid. What am I missing here? |
What if we took a hybrid approach and only split out the chains during catch-up, including them in QUALTIY otherwise? Basically:
I haven't thought this through completely so I know there are likely some hidden complexities, but I don't think it's any more complex than the current proposal. |
During quality, we learn which chains are valuable, so we proposed this "within-phase rebroadcast". During normal operations, ECChains land in the buffer, and are utilised by quality messages. If an adversary decided to spam, we get GMessage, we don't know it's ECChain, we note the hash of it until first round of ECChain rebroadcasts hits then we are able to resolve the ECChain hash and proceed. Each time an ECChain proves useful in a valid message, its priority would get bumped up, so it won't get discarded. That is to protect against 33% adversary that decided to start spamming valid GMessages, each with unique ECChain. |
I don't see what exactly that buys us, it is all the complexity, plus more with having to support both ways of doing things.
Re 1. I will look if we can convince ourselves that the gain there will be significant enough to solve the bandwidth, it could be an option, although it brings in a lot of the complexity of having to communicate hashes for smaller gain. |
But it's not possible to spam quality messages.
I'm still not understanding the probability part of this. Either it's included in a valid quality message and is therefore required or it's not and is therefore spam. |
My main proposal is to run a quality round first, collecting at most one quality message from all participants. Then run an EC chain round, accepting all EC chains that match hashes we expect from the quality round. This avoids having to try to "guess" which EC chains are valid and completely solves all the spam issues. Unfortunately, this means adding an additional round. We can avoid that additional round by including short ECChains inside quality messages, skipping the ECChain round if it looks like it's unnecessary. |
While we are finalising the design, I am going to start work on this by implementing hash based gpbft messaging and wiring in the on-the-side lookup from hash to chain. |
This makes sense to me. My thoughts:
|
Another idea: Adaptive Poll-based Chain exchange protocol: Utilise a polling mechanism similar to cert exchange but for chain, that populates a look-aside lookup table to map a hash to a chain proposal for hashes discovered from the received GPBFT messages. The Gist:
|
This one small point encapsulates a lot of complexity and possible latency. |
I'd like to flesh those out in the standup and dig deeper into the comparison between approaches. |
I can get behind an extra phase, or maybe an extra long quality round, as ECChains "round" could be transparent to gpbft. |
I agree.
As long as it doesn't introduce timing issues... I think it's fine because we need to see a strong quorum of votes to time out in prepare. Although it does increase our chances of failing in prepare (slightly).
I think we can do that. Basically, have a separate module that receives both quality messages and EC Chains:
This should be fine because PREPARE will pause until we receive some quorum, or evidence that we cannot reach quorum. |
2024-12-18 (some) standup notes:
|
Implement chain exchange protocol over pubsub as a mechanism to propagate `ECChain` across the network with reasonable spam protection. To protect against spam the mechanism employs two separate caches for chains that are generally discovered across the network and the ones explicitly looked up or broadcasted by the local node. Both caches are capped LRU, where the LRU recent-ness is used as a way to prioritise chains we cache while keeping the total memory footprint fixed. This approach is not the most memory efficient but is simpler to implement as the LRU encapsulates a lot of the complexity. The code has a lot of TODOs as places to improve or question to the reviewer. To action most of the TODOs further refactoring across the code is needed which is intended to be actioned in separate commits. The code path introduced here is not integrated into F3 host; future PRs will iteratively integrate the mechanism across F3 host and other places. Part of #792
Implement chain exchange protocol over pubsub as a mechanism to propagate `ECChain` across the network with reasonable spam protection. To protect against spam the mechanism employs two separate caches for chains that are generally discovered across the network and the ones explicitly looked up or broadcasted by the local node. Both caches are capped LRU, where the LRU recent-ness is used as a way to prioritise chains we cache while keeping the total memory footprint fixed. This approach is not the most memory efficient but is simpler to implement as the LRU encapsulates a lot of the complexity. The code has a lot of TODOs as places to improve or question to the reviewer. To action most of the TODOs further refactoring across the code is needed which is intended to be actioned in separate commits. The code path introduced here is not integrated into F3 host; future PRs will iteratively integrate the mechanism across F3 host and other places. Part of #792
The most significant offender here is ECChain, which is the same (or similar) for many messages and can end up large.
The idea is to hash the ECChain before sending the message and let the receiver resolve it from the hash.
In many scenarios, we will know which ECChain corresponds to the hash because we will have the same one, but for the less common case, we need to propagate these ECChains through pubsub. This can be done on a separate topic (or maybe even on the same topic).
That propagation channel should limit the propagations to only the current instance, so the ECChain message should carry the instance number.
There are a couple of architectural questions:
Decision: GPBGT should be unaware
Decision: Buffer them, and the queue should reconstitute them when ECChain arrives. Have a metric about it.
Partial decision: Hard limit, but we need strategies to keep important ones if we get to the limit.
Notes:
After receiving an ECChain, we should add all prefix chains to the lookup buffer. This provides more information and improves efficiency. We should also preload the buffer with our own ECChain.
Tasks
The text was updated successfully, but these errors were encountered: