-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finitizing monotonic systems #97
Conversation
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
ee74ea5
to
e1e8edf
Compare
Signed-off-by: Andrew Helwer <[email protected]>
LGTM - Have you considered merging the blog post into the spec as comments? |
Renamed MC specs Changed tag to intermediate Signed-off-by: Andrew Helwer <[email protected]>
@lemmy resolved issues. Never really considered adding blog posts to specs as comments, mainly because I don't want to deal with line wrapping and as-written the blog post embeds specs rather than vice versa. |
When I read your blog post I ended up manually reconstructing the TLA+ spec by copy&pasting snippets from your blog. I would have preferred to read a well-documented specification instead. Anyway, LGTM. Not sure if @muenchnerkindl or anybody else is reviewing this PR. |
@lemmy I don't believe he is. |
Let no answer be a negative one. |
@ahelwer Thank you for your contributions. Please keep up the good work as a maintainer of this repository. :-) |
Oh excellent, thank you! I will aspire to maintain its quality and usefulness. |
|
||
Monotonicity == [][Monotonic]_counter | ||
|
||
Convergence == \A n, o \in Node : counter[n] = counter[o] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a liveness property, but a state predicate on the initial state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just defines what it means for the system to have converged, there's a liveness property in Finitize_CRDT.tla
with the definition Liveness == converge ~> S!Convergence
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blog post stats that this is a liveness property:
Finally, the big one: this is an eventually-consistent system, so we want to validate its eventual consistency as a liveness property! We want to say it is always the case that eventually the vectors on all nodes will have the same value. Let’s define this goal state as:
If anything, the blog post has to be corrected. I presume the desired property would be \A n, o \in Node: []<>(counter[n] = counter[o])
, combined with asserting fairness over Gossip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be misread but it says it defines the "goal state" as that, not the liveness property generally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"goal state" is rather ambiguous in a non-terminating system? At any rate, it seems easy to make less ambiguous.
|
||
Increment(n) == | ||
/\ ~converge | ||
/\ counter[n][n] < Divergence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively the state constraint of Constraint_CRDT in addition to being one of Increment's enablement conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this keeps the counters from going beyond the divergence limit and thus ensuring the state space remains finite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is less of a hammer than what the blog post calls a "state restriction hammer".
By the way, we also "hardcode" the state constraints in this model, for example, to prevent TLC from evaluating an action when the original state constraint would evaluate to false. While this approach improves efficiency slightly, the impact is minimal. It does lower the value of "states generated", though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you say that adding a state constraint along with a GarbageCollect action would be equivalent to having the line counter[n][n] < Divergence
plus a GarbageCollect action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, TLC consistently reports the same outcome (counterexample/pass), when exploring the different fairness constraints in CRDT below:
------------------------------- MODULE CRDT ---------------------------------
EXTENDS Naturals
CONSTANT Node
VARIABLE counter
vars == counter
TypeOK == counter \in [Node -> [Node -> Nat]]
Safety == \A n, o \in Node : counter[n][n] >= counter[o][n]
Convergence == []<>(\A n, o \in Node : counter[n] = counter[o])
Init == counter = [n \in Node |-> [o \in Node |-> 0]]
Increment(n) == counter' = [counter EXCEPT ![n][n] = @ + 1]
Gossip(n, o) ==
LET Max(a, b) == IF a > b THEN a ELSE b IN
counter' = [
counter EXCEPT ![o] = [
nodeView \in Node |->
Max(counter[n][nodeView], counter[o][nodeView])
]
]
Next ==
\/ \E n \in Node : Increment(n)
\/ \E n, o \in Node : Gossip(n, o)
Fairness ==
\* /\ WF_vars(Next)
\* /\ ~ \E n \in Node : <>[][Increment(n)]_vars
\* /\ \A n, o \in Node : <>[][Gossip(n,o)]_vars
\* /\ \A n \in Node : WF_vars(Increment(n))
/\ \A n, o \in Node : WF_vars(Gossip(n,o))
\* /\ Convergence
Spec ==
/\ Init
/\ [][Next]_counter
/\ Fairness
=============================================================================
------ MODULE MCCRDT -----
EXTENDS CRDT
CONSTANT Divergence
C == INSTANCE CRDT
MCIncrement(n) ==
/\ counter[n][n] < Divergence
/\ C!Increment(n)
GarbageCollect ==
LET SetMin(s) == CHOOSE e \in s : \A o \in s : e <= o IN
LET Transpose == SetMin({counter[n][o] : n, o \in Node}) IN
/\ counter' = [
n \in Node |-> [
o \in Node |-> counter[n][o] - Transpose
]
]
MCNext ==
\/ C!Next
\/ GarbageCollect
StateConstraint == \A n, o \in Node : counter[n][o] <= Divergence
======
SPECIFICATION Spec
CONSTANTS
Node = {n1, n2}
Next <- MCNext
INVARIANTS
TypeOK
Safety
PROPERTIES
Convergence
CONSTANTS Divergence = 2
CONSTANTS Increment <- MCIncrement
-----
SPECIFICATION Spec
CONSTANTS
Node = {n1, n2}
Next <- MCNext
INVARIANTS
TypeOK
Safety
PROPERTIES
Convergence
CONSTANTS Divergence = 2
CONSTRAINTS StateConstraint
Without GarbageCollect
, TLC reports pass with the state constraint but a counterexample with the increment conjunct with the \A n \in Node : WF_vars(Increment(n))
fairness constraint. However, that counterexample is bogus as it ends in stuttering, i.e., Increment
is disabled because of the conjunct:
Checking temporal properties for the complete state space with 36 total distinct states at (2024-10-16 12:13:58)
Error: Temporal properties were violated.
Error: The following behavior constitutes a counter-example:
State 1: <Initial predicate>
counter = (n1 :> (n1 :> 0 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 0))
State 2: <C!Next line 6, col 1 to line 6, col 18 of module MCCRDT>
counter = (n1 :> (n1 :> 0 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 1))
State 3: <C!Next line 6, col 1 to line 6, col 18 of module MCCRDT>
counter = (n1 :> (n1 :> 0 @@ n2 :> 1) @@ n2 :> (n1 :> 0 @@ n2 :> 1))
State 4: <C!Next line 6, col 1 to line 6, col 18 of module MCCRDT>
counter = (n1 :> (n1 :> 0 @@ n2 :> 1) @@ n2 :> (n1 :> 0 @@ n2 :> 2))
State 5: <C!Next line 6, col 1 to line 6, col 18 of module MCCRDT>
counter = (n1 :> (n1 :> 1 @@ n2 :> 1) @@ n2 :> (n1 :> 0 @@ n2 :> 2))
State 6: <C!Next line 6, col 1 to line 6, col 18 of module MCCRDT>
counter = (n1 :> (n1 :> 2 @@ n2 :> 1) @@ n2 :> (n1 :> 0 @@ n2 :> 2))
State 7: <C!Next line 6, col 1 to line 6, col 18 of module MCCRDT>
counter = (n1 :> (n1 :> 2 @@ n2 :> 1) @@ n2 :> (n1 :> 2 @@ n2 :> 2))
State 8: Stuttering
Finished checking temporal properties in 00s at 2024-10-16 12:13:58
181 states generated, 36 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 7.
/\ converge' = TRUE | ||
/\ UNCHANGED counter | ||
|
||
GarbageCollect == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no difference between checking Finitize_CRDT.tla/Finitize_CRDT.cfg with or without the sub-action GarbageCollect
present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, but this is a demo - if you are building a system on top of this CRDT which does other actions that progress the state beyond touching the counters, this ensures the system does not deadlock and can evolve as long as the counters stay within the convergence limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I believe it would explore the same state space but have many more state transitions (so more non-unique states, is how it would show up in the output).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With GarbageCollect
: 585,401 states generated, 50,000 distinct states found
Without GarbageCollect
: 535,401 states generated, 50,000 distinct states found
However, I would expect the liveness property to fail. I will look into this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I see why the liveness property succeeds. Anyway this overall would support liveness properties being checked over a system that requires more steps than permitted by a simple state constraint to accomplish something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
GarbageCollect
: 585,401 states generated, 50,000 distinct states foundWithout
GarbageCollect
: 535,401 states generated, 50,000 distinct states foundHowever, I would expect the liveness property to fail. I will look into this!
The "states generated" statistics is higher with GarbageCollect
because there is one more sub-action for which TLC generates successor states. All of these states are also reachable because of Gossip
or Increment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also notice that Liveness
holds even if Fairness
is changed to WF_vars(Next)
. This is because Liveness
is trivially true of behaviors with no Converge
actions, and all other behavior guarantee that there are only finitely many Increment
steps. However, we would expect CRDT to violate its liveness property "eventually consistent" if fairness is asserted over Next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no difference between checking Finitize_CRDT.tla/Finitize_CRDT.cfg with or without the sub-action
GarbageCollect
present.
Also notice that Liveness holds even if Fairness is changed to WF_vars(Next). This is because Liveness is trivially true of behaviors with no Converge actions, and all other behavior guarantee that there are only finitely many Increment steps. However, we would expect CRDT to violate its liveness property "eventually consistent" if fairness is asserted over Next.
Indeed, GarbageCollect
does make a difference once the converge
variable is eliminated (see #97 (comment)).
|
||
Fairness == \A n, o \in Node : WF_vars(Gossip(n, o)) | ||
|
||
StateConstraint == \A n, o \in Node : counter[n][o] <= 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StateConstraint
can be removed from Finitize_CRDT.tla.
|
||
CONSTANT Node | ||
|
||
VARIABLES counter, converge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really our convergence property will only become true if all requests stop for a while. We model this by adding a boolean variable, converge, which is arbitrarily set to TRUE at some point and stops additional increments from occurring. Since this is a virtual variable that wouldn’t exist in any real system, we should move it out into a model-checking spec that imports our “good copy” spec and wraps its definitions with converge logic:
Adding converge
changes the safety part of Spec
, i.e., Init /\ [][Next]_v
. In TLA, we don't have to change Init /\ [][Next]_v
but can conjoin a suitable fairness constraint. A suitable constraint, though not machine-closed, would be []<>S!Convergence
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I follow, and this might be related to your suggestion of a liveness property up above of \A n, o \in Node: []<>(counter[n] = counter[o])
. I don't think that property would actually hold even if we could model-check an infinite-state-space spec, right? I can imagine a behavior where some node keeps incrementing before a gossip reaches it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence the fairness constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does TLC support that? I thought it only supported fairness constraints. That's pretty neat! Do we lose anything by discarding machine closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we call any liveness formula that we conjoin to Init /\ [][Next]_v
a fairness constraint!? For example, I call the liveness formulas []<>S!Convergence
, WF_v(Next)
, \A n, o \in Node: WF_v(Gossip(n,o))
, ... fairness constraints when conjoined to Init /\ [][Next]_v
.
In CRDT, []<>S!Convergence
works fine as a fairness constraint with TLC. It is also easy to see that Init /\ [][Next]_v /\ []<>S!Convergence => []<>S!Convergence
(assuming Liveness
is defined as []<>S!Convergence
). Init /\ [][Next]_v /\ \A n, o \in Node: WF_v(Gossip(n,o)) => []<>S!Convergence
is less easy to see but should also hold. However, that fairness constraint is machine closed. I would argue that we don't care about machine closure in CRDT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairness constraint is a good term for it, I really meant I thought TLC only supported WF and SF assertions in that place.
Init /\ [][Next]_v /\ \A n, o \in Node: WF_v(Gossip(n,o)) => []<>S!Convergence
is less easy to see but should also hold.
Would this still hold if GarbageCollect is included? It seems like there should be an increment > gossip > increment > gossip > garbagecollect loop that would keep it from being satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this still hold if GarbageCollect is included? It seems like there should be an increment > gossip > increment > gossip > garbagecollect loop that would keep it from being satisfied.
Apologies, I didn't pay enough attention to the definition of Gossip. We have to strengthen the fairness constraint to e.g. (non-mc) WF_vars(Next) /\ \A n, o \in Node : <>[][Gossip(n,o)]_vars
if Divergence>1
. This constraint is stronger than it has to be, though.
Us programmers can barely think straight about one computer, let alone three or more! | ||
|
||
Unfortunately, if you add state restrictions you've cut yourself off from many powerful TLA⁺ features (see page 247, section 14.3.5 of [*Specifying Systems*](https://lamport.azurewebsites.net/tla/book.html) for details). | ||
In particular, you'll lose the ability to analyze liveness (good things eventually happen) or abstraction & refinement (showing that one spec is related to another). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too broad because checking the safety part of refinement LLInit /\ [][LLNext]_v => HLInit /\ [][HLNext]_v
works just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a large problem with success being reported due to vacuous truth? Or is my understanding wrong; related tlaplus/tlaplus#839
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean when checking liveness with state restrictions. It doesn't feel particularly safe. Also where is the formula LLInit /\ [][LLNext]_v => HLInit /\ [][HLNext]_v
coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLInit and LLNext are the initial predicate and next-state relation of the low-level spec and HLInit and HLNext are the ones from the high-level/refined spec.
Following last week's updates (see #153), I believe that this example requires a significant revision. Some changes I believe are warranted are as follows:
Similar changes will have to be made to That being said, I would generally like to see more evidence that TLC consistently finds and generates valid counterexamples. While tlaplus/tlaplus#1045 is related to using |
The first change sounds good. Regarding the second change, since GarbageCollect is always enabled shouldn't that prevent a stuttering counterexample if divergence is greater than 0? Perhaps putting GarbageCollect as a view changes this. |
------------------------------- MODULE CRDT ---------------------------------
EXTENDS Naturals
CONSTANT Node
VARIABLE counter
vars == counter
TypeOK == counter \in [Node -> [Node -> Nat]]
Safety == \A n, o \in Node : counter[n][n] >= counter[o][n]
Monotonic == \A n, o \in Node : counter'[n][o] >= counter[n][o]
Monotonicity == [][Monotonic]_counter
Convergence == []<>(\A n, o \in Node : counter[n] = counter[o])
Init == counter = [n \in Node |-> [o \in Node |-> 0]]
Increment(n) ==
/\ counter[n][n] < 3
/\ counter' = [counter EXCEPT ![n][n] = @ + 1]
Gossip(n, o) ==
LET Max(a, b) == IF a > b THEN a ELSE b IN
counter' = [
counter EXCEPT ![o] = [
nodeView \in Node |->
Max(counter[n][nodeView], counter[o][nodeView])
]
]
SetMin(s) == CHOOSE e \in s : \A o \in s : e <= o
GarbageCollect ==
LET Transpose == SetMin({counter[n][o] : n, o \in Node}) IN
counter' = [
n \in Node |-> [
o \in Node |-> counter[n][o] - Transpose
]
]
Next ==
\/ \E n \in Node : Increment(n)
\/ \E n, o \in Node : Gossip(n, o)
\/ GarbageCollect
Spec ==
/\ Init
/\ [][Next]_counter
THEOREM Spec => []TypeOK
THEOREM Spec => []Safety
Fairness ==
/\ \A n \in Node : WF_vars(Increment(n))
\* /\ WF_vars(GarbageCollect) \* Also stuttering with this one
FairSpec ==
/\ Spec
/\ Fairness
THEOREM FairSpec => Convergence
THEOREM FairSpec => Monotonicity
=============================================================================
-> % tlc -note CRDT.tla
TLC2 Version 2.20 of Day Month 20?? (rev: 2360829)
Running breadth-first search Model-Checking with fp 8 and seed 1312136866782931339 with 1 worker on 10 cores with 7282MB heap and 64MB offheap memory [pid: 68076] (Mac OS X 15.0.1 aarch64, Homebrew 23 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /Users/markus/src/TLA/_specs/examples/specifications/FiniteMonotonic/CRDT.tla
Parsing file /private/var/folders/7d/4x6z2cc91jl588ysynlc1tfc0000gn/T/tlc-8175494506309268083/Naturals.tla (jar:file:/Applications/TLA+%20Toolbox.app/Contents/Eclipse/tla2tools.jar!/tla2sany/StandardModules/Naturals.tla)
Semantic processing of module Naturals
Semantic processing of module CRDT
Starting... (2024-10-21 16:17:56)
Implied-temporal checking--satisfiability problem has 1 branches.
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-10-21 16:17:57.
Progress(9) at 2024-10-21 16:17:57: 621 states generated, 100 distinct states found, 0 states left on queue.
Checking temporal properties for the complete state space with 100 total distinct states at (2024-10-21 16:17:57)
Error: Temporal properties were violated.
Error: The following behavior constitutes a counter-example:
State 1: <Initial predicate>
counter = (n1 :> (n1 :> 0 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 0))
State 2: <Increment(n1) line 23, col 5 to line 24, col 50 of module CRDT>
counter = (n1 :> (n1 :> 1 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 0))
State 3: <Increment(n1) line 23, col 5 to line 24, col 50 of module CRDT>
counter = (n1 :> (n1 :> 2 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 0))
State 4: <Increment(n1) line 23, col 5 to line 24, col 50 of module CRDT>
counter = (n1 :> (n1 :> 3 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 0))
State 5: <Increment(n2) line 23, col 5 to line 24, col 50 of module CRDT>
counter = (n1 :> (n1 :> 3 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 1))
State 6: <Increment(n2) line 23, col 5 to line 24, col 50 of module CRDT>
counter = (n1 :> (n1 :> 3 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 2))
State 7: <Increment(n2) line 23, col 5 to line 24, col 50 of module CRDT>
counter = (n1 :> (n1 :> 3 @@ n2 :> 0) @@ n2 :> (n1 :> 0 @@ n2 :> 3))
State 8: <Gossip(n2,n1) line 28, col 3 to line 33, col 5 of module CRDT>
counter = (n1 :> (n1 :> 3 @@ n2 :> 3) @@ n2 :> (n1 :> 0 @@ n2 :> 3))
State 9: Stuttering
Finished checking temporal properties in 00s at 2024-10-21 16:17:57
621 states generated, 100 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 9.
Finished in 00s at (2024-10-21 16:17:57) |
Okay these changes sound good to me! |
From the blog post:
Composing By the way, |
That's very interesting, I didn't actually know that TLC supported \cdot! I agree GarbageCollect isn't a great term but informally it is what other software engineers seem to land on as the closest concept. I guess it could be called like MonotonicReduction or something. |
The commit microsoft/CCF@818e345 (part of microsoft/CCF#6587) illustrates that defining |
Added example specs & models from this blog post: https://ahelwer.ca/post/2023-11-01-tla-finite-monotonic/