create incentivization-outline.md

This commit is contained in:
Sergei Tikhomirov 2023-10-13 16:02:12 +02:00 committed by GitHub
parent 4f65895be8
commit 08233a7ce2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 230 additions and 0 deletions

230
incentivization-outline.md Normal file
View File

@ -0,0 +1,230 @@
Our goal is to add an incentivization scheme to Waku to make it (more) incentive compatible.
In what follows, we abbreviate incentivization as i13n.
We aim to answer the following questions:
1. what is the structure of the protocols in question?
2. what is the desired behavior of protocol participants?
3. what deviations from the desired behavior occur without incentivization?
4. what incentivization tools do we have?
5. what tools are appropriate for our purpose?
6. what parameters can we chose? what are our restrictions?
7. suggest a concrete i13n architecture.
8. how do we check if we've solved the problem?
# Overview
Waku implements a modular decentralized censorship resistant P2P communications protocol.
Waku consists of multiple protocols (see [architecture](https://waku.org/about/architect)).
We focus on the main four are Relay (a P2P protocol), and three light protocols: Filter, Store, and Lightpush, which have a client-server architecture (aka request-response).
A Waku node is a node that runs at least one of the Waku protocols.
A full Waku node is a node that runs Relay.
A light Waku node as a node that only runs client-side of one of the light protocols.
See also: https://github.com/waku-org/research/issues/28
In light protocols, a client sends a request to a server.
A server (a Relay node) performs some actions and returns a response, in particular:
- [[Filter]]: the server will relay (only) messages that pass a filter to the client;
- [[Store]]: the server responds with messages broadcast within the specified time frame;
- [[Lightpush]]: the server publishes the client's message to the Relay network.
Waku aims to function on widely available hardware.
Hardware requirements for light nodes are lower than for full nodes.
In particular, bandwidth consumption should be limited (estimated at 10 Mbps).
See also: https://github.com/waku-org/research/issues/31
# Store protocol
We first focus on i13n of the Store protocol.
Similar techniques may be later applied to other protocols.
Store is a client-server protocol.
A client asks the node to respond with relevant messages previously relayed through the Relay protocol.
A relevant message is a message that has been broadcast via Relay within the specified time frame.
The response may be split into multiple parts, as specified by pagination parameters.
TODO: Strictly speaking, the definition of relevant is inconsistent because there is no consensus over messages. A message may be broadcast but not received by some nodes. Does this happen often? Can and should we do something about it?
## Desired behavior
The desired behavior of a Store-server node is to store all non-ephemeral messages forever.
TODO: address the obvious concern that storing everything forever is unsustainable. Should there be some cut-off time after which old messages are no longer stored?
Let's say, a client issues a request to the server.
We want the following to happen:
- the server responds quickly;
- all the messages in the response are relevant;
- the response contains only relevant messages.
TODO: is this the full definition of the desired behavior?
### RLN as a proxy metric of message relevance
RLN (rate limiting nullifiers) is a method of spam prevention in Relay.
The message sender generates a proof of enrollment in some membership set.
Multiple proofs generated within one epoch lead to punishment.
This technique limits the message rate from each node to at most one message per epoch.
See also: https://rfc.vac.dev/spec/17/
In the i13n context, we can't prove whether a message has indeed been broadcast in the past.
Instead, we use RLN proofs as a proxy metric.
A valid RLN proof signifies that the message has been generated by a node with an active membership during a particular eposh.
TODO: make sure the above is correct: what exactly does RLN prove?
## Deviations from the desired behavior
There are multiple ways for a node to deviate from the desired behavior.
TODO: are we talking only about the server here, or should also discuss client (e.g., DoS)?
### Slow response
The server takes too long to respond.
Possible reasons:
- the server is offline accidentally;
- the request describes too many relevant messages (the server is overwhelmed);
- the server is malicious and deliberately delays the response;
- the server doesn't have some of the relevant messages and tries to request them from other nodes.
### Incomplete response
A relevant message is missing from the response.
Possible explanations:
- the server didn't receive the message when it was broadcast;
- the server deliberately withholds the message.
Contrary to blockchains, Relay doesn't have consensus over relayed messages.
Therefore, it's impossible to distinguish between the two scenarios above.
TODO: given this fact, what's the best we can aim for?
### Irrelevant response
The response contains a message that is not relevant.
There are two scenarios here depending on whether RLN proofs are enforced.
If RLN is not enforced, a server may insert any number or irrelevant messages into the response.
If RLN is enforced, a server can only do so as long as it has a valid membership to generate the respective proofs.
This doesn't eliminate the attackbut limits its consequences.
TODO: what are the powers of a malicious server when it comes to generating proofs for irrelevant messages? Can the server generate proofs for past epochs?
## Privacy considerations
Light protocols, in general, have weaker privacy properties than P2P protocols.
In a client-server exchange, a client wants to selectively interact with the network.
By doing so, it often reveals what it is interested in (e.g., subscribes to particular topics).
A malicious Store server can spy on a client in the following ways:
- track what time frames a client is interested in;
- analyze the timing of requests;
- link requests done by the same client.
TODO: expand in the context of an incentivized protocol.
# Cost-benefit analysis
The goal of i13n is to make nodes more likely to exhibit the desired behavior.
An incentive scheme links the payoffs to whether nodes follow the protocol or not.
Good behavior should be rewarded, bad behavior punished.
An incentive scheme should balance the costs and benefits for a node.
Rewards should compensate the cost of good behavior.
Punishments should offset the benefits that bad behavior may bring.
Let us analyze the costs and benefit of a server that are specific to the Store protocol:
- storage;
- bandwidth;
- computation.
Let us assume a constant flow of messages per epoch and a constant flow of requests for older messages.
There are two processes: storing incoming messages, and serving old messages to clients.
The cost of storing incoming messages for one epoch is composed of:
- storage:
- storage costs of all older messages: proportional to cumulative (message size x time stored);
- storage costs of newly arrived messages: proportional to message size;
- a constant cost for I/O operations (storing new messages);
- bandwidth (download) for receiving new messages: proportional to the total size of incoming messages per epoch;
- computational costs of receiving and storing new messages.
(Strictly speaking, the I/O cost may not always be constant due to caching, disk fragmentation, etc.)
The cost of storing messages to clients, per epoch, is composed of:
- storage: none (it's accounted for as storing cost);
- bandwidth
- upload: proportional to (number of clients) x (length of time frame requested) x (message size);
- download: proportional to the number of requests;
- computational cost of handling requests.
TODO: write this down mathematically.
Storage is likely the dominating cost.
Storage costs is proportional to the amount of information stored and the time it is stored for.
A cumulative cost of storing a single message grows linearly with time.
Assuming a constant stream of new messages, the total storage cost is quadratic in time.
The number of messages in a response may be approximated by the length of the time frame requested.
This assumes that messages are broadcast in the Relay network at a constant rate.
Computation: the server spends computing cycles while handling requests.
This costs likely depends not only on the computation itself, but also at the database structure.
For example, retrieving old or rarely requested messages from the local database may be more expensive than fresh or popular ones due to caching.
TODO: In file storage, I store a file and I pay for the ability to query it later. In Store, Alice relays a message, a server stores is, and later Bob queries it (and pays for it under an i13n scheme). Is there a mismatch between who incurs costs and who pays for it? Shall we think of ways to make Alice incur some costs too? See: https://github.com/waku-org/research/issues/32
# Incentivization tools
We can think of incentivization tools as a two-by-two matrix:
- rewards vs punishment;
- monetary vs reputation.
In other words, there are four quadrants:
- monetary reward: the client pays the server;
- monetary punishment: the server makes a deposit in advance and gets slashed in case of misbehavior;
- reputation reward: the server's reputation increases if it behaves well;
- reputation punishment: the server's reputation decreases if it behaves badly.
Reputation can only work if there are tangible benefits of having a high reputation and drawbacks of having a low reputation.
For example:
- clients are more likely to connect to servers with high reputation;
- clients disconnect from servers with low reputation.
Assuming there is a monetary aspect too, low-reputation servers miss out on potential revenue or lose their deposit.
Reputation, however, assumes ether a repeated interaction (i.e., local reputation), or some amount of trust / centralization (centrally managed rankings).
Monetary i13n tools, in turn, pose a key question: how to ensure atomicity between performance and reward or punishment?
In other words, if the client pays first, the server may take the money and not provide the servers.
Analogously, if the payment is due after the fact, the client can refuse to pay.
Linking payments with behavior involves a certain amount of trust as well.
This issue is somewhat linked to the problem of Lightning watchtower incentivization (see https://www.talaia.watch/).
A general observation: if monetary flows are dependent on events in the past, and there is no consensus on what exactly happened in the past, the scheme can be exploited.
TODO: can we use some on-chain component here as a semi-trusted arbiter?
## Payment methods
What we want from a payment method (order of priority to be discussed):
- wide distribution (many people already have it);
- high liquidity (i.e., easy to buy or sell at a reasonable exchange rate);
- low latency;
- high security.
Let's list all (decentralized) payment options that we have:
- proof-of-work: outsource-able, unavailable for consumer hardware - or is it? (Equihash etc)
- proof-of-X (storage, etc)
- cryptocurrency:
- ETH
- a token on Ethereum (ERC20)
- a token on another EVM blockchain
- a token on an EVM-based rollup
- a token on a non-EVM blockchain (BTC / Lightning?)
# Related work
Decentralized storage is not a new idea. What is relevant for us?
1. Federated real-time messaging (IRC, mailing lists). There is no "sync" in IRC; there are simply logs of prior conversations optionally hosted wherever.
2. Centralized file storage (FTP, later Dropbox). Requires trust in availability, but not necessarily confidentiality: content can be encrypted (modulo metadata).
3. P2P file-sharing: Napster, BitTorrent, eDonkey. The power of defaults, local reputation.
4. Decentralized storage in the blockchain age: Storj, Sia, Filecoin, IPFS, Codex...
# Future work
How to generalize i13n for Store to other Waku protocols?