* Refactor Block/Header definition
Refactor block/header definition so that it's now responsibility
of the nomos-core crate. This removes definitions in ledger/consensus
crates since there's no need at that level to have an understanding
of the block format.
The new header format supports both carnot and cryptarchia.
* A wrapper crate for prometheus client
* Initial integration of metrics for mempool
* Merge mempool metrics imports
* Add cli flag to enable metrics
* Add nomos metrics service for serving metrics
* Use nomos prometheus metrics in the node
* Rename metrics to registry where applicable
* Expose metrics via http
* Featuregate the metrics service
* Style and fail on encode error
* Add metrics cargo feature for mempool
* Add chat demo for testnet
This commit adds a simple demo to showcase the capabilities of the
Nomos architecture. In particular, we want to leverage the DA
features and explore participants roles.
At the same time, we're not ready to commit to any speficic format
or decision regarding common ground yet.
For this reason, we chose to implement the demo at the Execution
Zone (EZ) level.
In contrast to the coordination layer, each execution
zone can decide on its own format, which allows us to experiment
without having to set a standard.
The application of choice for the demo is an (almost) instant
messaging app where the messages are broadcast to the public by
leveraging the full replication data availability protocol.
In this context, the cli app acts as a small EZ disseminating
blobs, promoting blob inclusion and updating its state (i.e. list
of exchanged messages) upon blob inclusion in the chain.
---------
Co-authored-by: danielsanchezq <sanchez.quiros.daniel@gmail.com>
Put a hard limit of 512 blocks in the response returned by
GetBlocks to avoid slowing things down. This number was chosen
rather arbitrarily. We might want to do some more fine tuning.
* Split long consensus response in separate APIs
Consensus info was returning the full list of blocks even though
that can get quite large with time. Instead, this commit change
that API to return a constant size message and adds a new one to
return a chain of blocks with user specified endings.
* Update nomos-services/consensus/src/lib.rs
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
* Fix test
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
Block.id is a necessary piece of information in the context
of the consensus engine since there it's not possible to recover
the id of the block since the contents are not available.
Instead, we should only skip that field when serializing/deserializing
a full block.
* humanize array ser/deser
* split fns
* use `const-hex`
* fix fmt
* create `nomos-utils` crate
* Human serde committeeid (#478)
* Human readable serde for CommitteeId
* Deserialize bytes to string if human readable
* Don't allocate if possible in human serde bytes
---------
Co-authored-by: gusto <bacv@users.noreply.github.com>
* Change impl of StorageReceiver to Option<Bytes>
Load and remove messages return Option<Bytes> and not Bytes, so
let's change the implementation to work around that.
* Add storage/block http api to retrieve blocks from storage
* add tests for storage/block api
* debug tests
* tweak test node online condition
* Update deps
* Implement base lifecycle handling in network service
* Implement base lifecycle handling in storage service
* Use methods instead of functions
* Pipe lifecycle in metrics service
* Pipe lifecycle in mempool service
* Pipe lifecycle in log service
* Pipe lifecycle in da service
* Pipe lifecycle in consensus service
* Refactor handling of lifecycle message to should_stop_service
* Update overwatch version to fixed run_all one
* Use tree overlay in nomos node
* Use tree overlay in tests module
* Handle unexpected consensus vote stream end in tally
* Spawn the next leader first (#425)
* Report unhappy blocks in happy path test (#430)
* report unhappy blocks in the happy path test
* Make threshold configurable for TreeOverlay (#426)
* Modified test, so that all nodes don’t all connect to the first node only (#442)
* merge fix
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
Co-authored-by: Al Liu <scygliu1@gmail.com>
* Send DA certificate to node after dissemination
* rename mempool endpoints
* Check certificate inclusion in tests
* rename endpoint
* Rename addcert and addtx to add
* tweak test condition
* add option to save certificate to file
* move thread join
* remove fancy prints
* Use more descriptive names for generic parameters
We're starting to have tens on generic parameters, if we don't use
descriptive names it'll be pretty hard to understand what they do.
This commit changes the names of the mempool parameters in
preparation for the insertion of the da certificates mempool to
distinguish it from cl transactions.
* Add mempool for da certificates
* Add separate certificates mempool to binary
* ignore clippy lints
* Add `send` method to mempool network adapter
Centralize responsabilities for mempool-network interface in the
adapter trait.
* Update nomos-services/mempool/src/lib.rs
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
Overwatch requires all services to have a different service id.
Unfortunately, such service id can't depend on generic parameters,
which means that we can't have two instances of the mempool
service even if they are instantiated with different types.
This commit circuments this limitation by adding another
type parameter.
* Make mempool item generic
Make the mempool generic with respect to the item and remove
mentions of specific transaction formats/traits. This will allow
us to reuse the same code for both coordination layer transactions
and certificates, or in general, whatever items need to be included
in a block.
* Add mempool network adapter settings
Allow for greater customization of the mempool network adapter by
adding a settings field.
* update node after mempool changes
* fix waku mempool adapter
* fmt
* fix tests
* fmt
* Use selection for blob certificates
* Fix bin imports
* Fix rebase
* Missing blobs -> certificates refactor
* Fix attestation and certificate as_bytes
* More naming refactors
* Make the data availability service work with multiple protocols
* Add a generic way to instantiate DaProtocol
Add settings type and a new `new(Self::Settings)` method to
build a new DaProtocol instance
* Add data availability service to node
* fix tests
* fix imports
* Add `mixnode` and `mixnet-client` crate (#302)
* Add `mixnode` binary (#317)
* Integrate mixnet with libp2p network backend (#318)
* Fix#312: proper delays (#321)
* proper delays
* add missing duration param
* tiny fix: compilation error caused by `rand` 0.8 -> 0.7
* use `get_available_port()` for mixnet integration tests (#333)
* add missing comments
* Overwatch mixnet node (#339)
* Add mixnet service and overwatch app
* remove #[tokio::main]
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
* fix tests for the overwatch mixnode (#342)
* fix panic when corner case happen in RandomDelayIter (#335)
* Use `log` service for `mixnode` bin (#341)
* Use `wire` for MixnetMessage in libp2p (#347)
* Prevent tmixnet tests from running forever (#363)
* Use random delay when sending msgs to mixnet (#362)
* fix a minor compilation error caused by the latest master
* Fix run output fd (#343)
* add a connection pool
* Exp backoff (#332)
* move mixnet listening into separate task
* add exponential retry for insufficient peers in libp2p
* fix logging
* Fix MutexGuard across await (#373)
* Fix MutexGuard across await
Holding a MutexGuard across an await point is not a good idea.
Removing that solves the issues we had with the mixnet test
* Make mixnode handle bodies coming from the same source concurrently (#372)
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
* Move wait at network startup (#338)
We now wait after the call to 'subscribe' to give the network
the time to register peers in the mesh before starting to
publish messages
* Remove unused functions from mixnet connpool (#374)
* Mixnet benchmark (#375)
* merge fixes
* add `connection_pool_size` field to `config.yaml`
* Simplify mixnet topology (#393)
* Simplify bytes and duration range ser/de (#394)
* optimize bytes serde and duration serde
---------
Co-authored-by: Al Liu <scygliu1@gmail.com>
Co-authored-by: Daniel Sanchez <sanchez.quiros.daniel@gmail.com>
Co-authored-by: Giacomo Pasini <Zeegomo@users.noreply.github.com>
Firstly, a failure in deserialization for a network message is not
an error especially since we're using a public channel.
Secondly, that same channel is shared by different kind of messages
so trying to interpret one as the other will surely lead to a
unsuccessfull attempt.
* Add basic da module and traits
* Pipe new blobs and internal message handling
* Add and pipe send attestation method
* Add blob trait
* Make da backend async
* Implement mocka backend
* Bound blob in da backend to blob trait
* Added remove blob
* Rename reply to attestation
* Add support for libp2p backend in integration tests
* Add support for libp2p in nomos-node
* change default to waku
* add mutually exclusive features warning
* disable default features to avoid unification
* disable default features
* remove leftover cargo build
* Make sure we are subscribed to libp2p topic at startup
* unify imports
* typo in ci config
* Sequential build and test steps for features
* Add RandomBeaconState to libp2p carnot variant
---------
Co-authored-by: Gusto <bacvinka@gmail.com>
* Move roundrobin to leadership module
* Use references in leader selection
* Add membership traits to overlay and create membership module
* Implement committee membership for random beacon state
* Update flat overlay
* Create updateable membership traits and impls
* Update tree overlay
* Update overlay on consensus service
* Update overlay on simulations nomos node
* Update types on tests and modules
* Use chacha for shuffling
* Change to mut slice instead of inner cloning
* Use fisher yates shuffle from scratch
* Stylish and clippy happy
* Generate network events for self messages
Waku does that and it's kind of convenient not to handle ourselves
in a different way from the rest.
* Use bigger buffer + fmt
When receiving messages coming from libp2p IWANT requests, it's
common to receive a burst of packest which can cause subscribers
to lag. To account for that, let's increase the buffer in the
broadcast channel.
* Check if topic is being subscribed before self-notification (#292)
* fmt
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
* Add initial implementation of libp2p consensus adapter
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
* fix
* Handle all message types received via gossipsub (#283)
* remove todo
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
* Rework NetworkAdapter API
The NetworkAdapter API failed to isolate the internals by
providing a way to send a message to a user-provided channel while
the stream listeners expected specific formats.
Unify network messages under the same enum and simplify sending/
broadcasting messages.
* remove redundant inlines
* use committee.id()
* fmt
* Make timeout configurable
Add a way to configure the consensus timeout at startup.
* Make leader threshold and timeout configurable in tests
* Add tests for the unhappy path
Add a test for the unhappy path by stopping a node.
The rest of the peers are sufficient to reach a quorum but the
offline node will fail to produce a block when it's its turn as a
leader, thus triggering the recovery procedure twice before the
test is considered complete.
* ignore clippy warning
* CommitteeId type wrapper
* rewrite committee id logic
* remove unused convert
* use correct way to get committe id by member id
* cleanup and add comment
* Simplify child committees method
---------
Co-authored-by: danielsanchezq <sanchez.quiros.daniel@gmail.com>
* Add unhappy path handlers for first view
Nodes resolving the first view were not instructed to handle
possible failures. In particular, no timeouts were configured
and nodes were not listening for timeouts from other nodes.
This commit fixes this by making them use the
same path as every other view.
* Endless timeouts
Keep signaling timeout until consensus is achieved.
This could be helpful in network partitioning where messages could
be lost.
* Ensure consistent committee hash
Sort committee members before hashing to ensure all nodes in the
network obtain a consistent hash
* Fix timeout_qc filtering condition
'>' was used instead of '=='
* Fix new view topic name
'votes' was used instead of 'new-view'
* Fix filtering condition for unhappy path tally
Rogue '!' at the beginning
* Add timeout tally
Filter timeouts to allow only timeouts for the current view
coming from root committee members.
We might want to try to unigy this with happy and unhappy tally.
* Add debug logs
* clippy happy
* add basic integration tests
* add a way to configure overlay threshold
* Save logs to file in case of failure
* Increase number of test nodes to 10
* fix tests
* use fraction instead of tuple
* fmt
* Fix order of network streams
When fetching a message from the network, we need to first listen
for incoming messages and then look at the storage. If we do this
in the opposite order, there's a brief moment where we've freezed
our view of stored messages and are not yet listening for incoming
ones, thus risking loosing messages.
* Wait for waku db spurious delays
Sometimes waku takes some time (e.g. a few seconds) before making
a received message available through the archive query. Let's
account for this by making repeated calls if the first one is not
successful.
* Add initial network wait
We've observed in testing that even if waku reports that some
peers are connected it can't really deliver a message. To overcome
this limitation, we add a wait at the network service startup.
We know this is not ideal, but Waku will eventually be replaced and
we're looking for a quick fix.
* fmt
* add missing accessors
* add api to return info from consensus service
* move processing into its own function
* feat: add HTTP GET `/carnot/info` API (#178)
---------
Co-authored-by: Youngjoon Lee <taxihighway@gmail.com>
The consensus service was becoming a bit messy and difficult to maintain.
This PR moves handling invididual events to their own function and refactors the cancelabe task system in a separate struct.