From 2203c5c4a37ac93002631896065466ad9903f6b9 Mon Sep 17 00:00:00 2001 From: Corey Petty Date: Thu, 2 Feb 2023 16:39:13 -0500 Subject: [PATCH] updated FAQ --- .../consensus/candidates/carnot/FAQ.md | 45 ++++++++++++++++++- .../consensus/candidates/carnot/overview.md | 8 +++- 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/content/roadmap/consensus/candidates/carnot/FAQ.md b/content/roadmap/consensus/candidates/carnot/FAQ.md index 8f9989aa4..bae6c7cf7 100644 --- a/content/roadmap/consensus/candidates/carnot/FAQ.md +++ b/content/roadmap/consensus/candidates/carnot/FAQ.md @@ -15,6 +15,31 @@ openToc: true > **Definition 4** (Probabilistic Fulfillment). _After the GST, and when the current and previous leaders are correct, the number of votes collected by teh current leader is $2c+1$ (w.h.p)._ +## Tradeoffs + +### I think the main clear disadvantage of such a scheme is the added latency of the multiple layers. - Alvaro + +> `Moh:` The added latency will be O(log(n/C)), where C is the committee size. But I guess it will be hard to avoid it. Though it also depends on how fast the network layer (potentially Waku) propagats msgs and also on execution time of the transaction as well. + +> `Alvaro:` Well IIUC the only latency we are introducing is directly proportional to the levels of subcommitee nesting (ie the log(n/C)), which is understandably the price to pay. We have to make sure though that what we gain by introducing this is really worth the extra cost vs the typical comittee formation via randao or perhaps VDFs + +> `Moh:` Again the Typical committee formation with randao can reduce their wait time value to match our latency, but then it becomes vulnerable and fail if the network latency becomes greater than their slot interval. If they keep it too large it may not fail but becomes slow. We won't have that problem. If an adversary has the power to slow down the network then their liveness will fail, whereas we won't have that issue. + +## How would you compare Aptos and Carnot? - Alvaro + +> `Moh:` It is variant of DiemBFT, Sui is based on Nahrwal, both cannot scale to more than few hunderd of nodes. That is why they achieve that low latency. + +> `Alvaro:` Yes, so they need to select a committee of that size in order to operate at that latency What's wrong with selecting a committee vs Carnot's solution? This I'm asking genuinely to understand and because everyone will ask this question when we release. + +> `Moh:` When you select a committee you have to wait for a time slot to make sure the result of consensus has propagated. Again strong synchrony assumption (slot time), formation of forks, increase in PoS attack vector come into play +Within committee the protocol does not need a wait time but for its results to get propagated if scalability is to be achieved, then wait time has to be added or signatures have to be collected from thousands of nodes. + +> `Alvaro:` Can you elaborate? + +> `Moh:` Ethereum (and any other protocol who runs the consensus in a single committee selected from a large group on nodes) has wait time so that the output of the consenus propagates to all honest nodes before the next committee is selected. Else the next committee will fail or only forks will be formed and chain length won't increase. But since this wait time as stated, increases latency, makes the protocol vulnerable, Ethereum wants to avoid it to achieve responsivess. To avoid wait time (add responsiveness) a protocol has to collect attestation signatures from 2/3rd of all nodes (not a single committee) to move to the second round (Carnot is already responsive). But aggregating and verifying signatures thousands of signatures is expensive and time consuming. This is why they are working to improve BLS signatures. Instead we have changed the consensus protocol in such a way that a small number of signatures need to be aggregated and verified to achieve responsiveness and fast finality. We can further improve performance by using the improved BLS signatures. + +> One cannot achieve fast finality while running the consensus in a small committee. Because attestation of a Block within the single committee is not enough. This block can be averted if the leader of the next committee has not seen it. Therefore, there should be enough delay so that all honest nodes can see it. This is why we have this wait/slot time. Another issue can be a malicious leader from the next chosen committee can also avert a block of honest leader and hence preventing honest leaders from getting rewards. If blocks of honest leaders are averted for long time, stake of malicious leaders will increase. Moreover, malicious leaders can delay blocks of honest nodes by making fork and averting them. Addressing these issues will further make the protocol complex, while still laking fast finality. + ## Data Distribution ### How much failure rate of erasure code transmission are we expecting. Basically, what are the EC coding parameters that we expect to be sending such that we have some failure rate of transmission? Has that been looked into? - Dmitriy @@ -49,6 +74,20 @@ openToc: true > An honest node should wait for a specific number of children votes (to make sure everyone is voting on the same proposal) before voting but does not need to provide any cryptographic proof. Though we build a threshold signature from root committee members and it’s children but not from the whole tree. As long as enough number of nodes follow the the protocol we should be fine. I am working on protocol proofs. Also I think bugs should be discovered during development and testing phase. Changing protocol to detect potential bug might not be a good practice. +### doesn't having randomly distributed malicious nodes (say there is a 20%) increase the odds that over a third of a committee end up being from those malicious ones? It seems intuitive: since a 20% at the global scale is always <1/3, but when randomly distributed there is always non-zero chance they end up in a single group, thus affecting liveness more and more the closer we get to that global 1/3. Consequently, if I'm understanding the algorithm correctly, it would have worse liveness guarantees that classical pBFT, say with a randomly-selected commitee from the total set. - Alvaro + +> `Alexander:` We assume that fraction of malicious nodes is $1/4$ and given we chooses comm. sizes, which will depend on total number of nodes, appropriately this guarantees that with high probability we are below $1/3$ in each committee. + +> `Alvaro:` ok, but then both the global guarantee is below that current "standard" of 1/3 of malicious nodes and even then we are talking about non-zero probabilities that a comm has the power to slow down consensus via requiring reformation of comms (is this right?) + +> `Alexander:` This is the price we pay to improve scalability. Also these probabilities of failure can be very low. + +### What happens in Carnot when one committee is taken over by >1/3 intra-comm byzantine nodes? - Alvaro + +> `Moh:` When there is a failure the overlay is recalculated. By gradually increasing the fault tolerance by a small value, the probability of failure of a committee slightly increases but upon recalculating the correct overlay, inactive nodes that caused the failure of previous overlay (when no committee has more than 1/3 Byzantine nodes) will be slashed. + + + ## Synchronicity ### How to guarantee synchronicity. In particular how to avoid that in a big network different nodes see a proposal with 2c+1 votes but different votes and thus different random seed - Giacomo @@ -57,4 +96,8 @@ openToc: true > The adversary must cause the GST event to eventually happen after some unknown finite time. Any message sent at time x must be delivered by time $\delta + \text{max}(x,GST)$. In the Partial synchrony model, the system behaves asynchronously till GST and synchronously after GST. -> Moreover, votes travel one level at a time from tree leaves to the tree root. We only need the proof of votes of root+child committees to conclude with a high probability that the majority of nodes have voted. \ No newline at end of file +> Moreover, votes travel one level at a time from tree leaves to the tree root. We only need the proof of votes of root+child committees to conclude with a high probability that the majority of nodes have voted. + +### That's a timeout? How does this work exactly without timing assumptions? Trying to find this in the document -Alvaro + +> `Moh:` Each committee only verifies the votes of its child committees. Once a verified 2/3rd votes of its child members, it then sends it vote to its parent. In this way each layer of the tree verifies the votes (attests) the layer below. Thus, a node does not have to collect and verify 2/3rd of all thousands of votes (as done in other responsive BFTs) but only from its child nodes. \ No newline at end of file diff --git a/content/roadmap/consensus/candidates/carnot/overview.md b/content/roadmap/consensus/candidates/carnot/overview.md index 9289352b4..6833919b4 100644 --- a/content/roadmap/consensus/candidates/carnot/overview.md +++ b/content/roadmap/consensus/candidates/carnot/overview.md @@ -7,7 +7,13 @@ tags: editor: "Corey Petty" --- -Carnot (formerly LogosBFT) is a Byzantine Fault Tolerant (BFT) [consensus](roadmap/consensus/index.md) candidate for the Nomos Network that utilizes Fountain Codes and a committees tree structure to optimize message propagation in the presence of a large number of nodes, while maintaining high througput and fast finality. +Carnot (formerly LogosBFT) is a Byzantine Fault Tolerant (BFT) [consensus](roadmap/consensus/index.md) candidate for the Nomos Network that utilizes Fountain Codes and a committees tree structure to optimize message propagation in the presence of a large number of nodes, while maintaining high througput and fast finality. More specifically, these are the research contributions in Carnot. To our knowledge, Carnot is the first consensus protocol that can achieve together all of these properties: + +1. Scalability: Carnot is highly scalable, scaling to thousands of nodes. +2. Responsiveness: The ability of a protocol to operate with the speed of a wire but not a maximum delay (block delay, slot time, etc.) is called responsiveness. Responsiveness reduces latency and helps the Carnot achieve Fast Finality. Moreover, it improves Carnot's resilience against adversaries that can slow down network traffic. +3. Fork avoidance: Carnot avoids the formation of forks in a happy path. Forks formation has the following adverse consequences that the Carnot avoids. + 1. Wastage of resources on orphan blocks and reduced throughput with increased latency for transactions in orphan blocks + 2. Increased attack vector on PoS as attackers can employ a strategy to force the network to accept their fork resulting in increased stake for adversaries. - [FAQ](roadmap/consensus/candidates/carnot/FAQ.md): Here is a page that tracks various questions people have around Carnot.