create incentivization-outline.md

2023-10-13 16:02:12 +02:00 · 2023-10-13 16:02:12 +02:00 · 08233a7ce2
parent 4f65895be8
commit 08233a7ce2
1 changed files with 230 additions and 0 deletions
--- a/incentivization-outline.md
+++ b/incentivization-outline.md
@ -0,0 +1,230 @@
+Our goal is to add an incentivization scheme to Waku to make it (more) incentive compatible.
+In what follows, we abbreviate incentivization as i13n.
+
+We aim to answer the following questions:
+
+1. what is the structure of the protocols in question?
+2. what is the desired behavior of protocol participants?
+3. what deviations from the desired behavior occur without incentivization?
+4. what incentivization tools do we have?
+5. what tools are appropriate for our purpose?
+6. what parameters can we chose? what are our restrictions?
+7. suggest a concrete i13n architecture.
+8. how do we check if we've solved the problem?
+
+# Overview
+
+Waku implements a modular decentralized censorship resistant P2P communications protocol.
+Waku consists of multiple protocols (see [architecture](https://waku.org/about/architect)).
+We focus on the main four are Relay (a P2P protocol), and three light protocols: Filter, Store, and Lightpush, which have a client-server architecture (aka request-response).
+
+A Waku node is a node that runs at least one of the Waku protocols.
+A full Waku node is a node that runs Relay.
+A light Waku node as a node that only runs client-side of one of the light protocols.
+See also: https://github.com/waku-org/research/issues/28
+
+In light protocols, a client sends a request to a server.
+A server (a Relay node) performs some actions and returns a response, in particular:
+- [[Filter]]: the server will relay (only) messages that pass a filter to the client;
+- [[Store]]: the server responds with messages broadcast within the specified time frame;
+- [[Lightpush]]: the server publishes the client's message to the Relay network.
+
+Waku aims to function on widely available hardware.
+Hardware requirements for light nodes are lower than for full nodes.
+In particular, bandwidth consumption should be limited (estimated at 10 Mbps).
+See also: https://github.com/waku-org/research/issues/31
+
+# Store protocol
+
+We first focus on i13n of the Store protocol.
+Similar techniques may be later applied to other protocols.
+
+Store is a client-server protocol.
+A client asks the node to respond with relevant messages previously relayed through the Relay protocol.
+A relevant message is a message that has been broadcast via Relay within the specified time frame.
+The response may be split into multiple parts, as specified by pagination parameters.
+
+TODO: Strictly speaking, the definition of relevant is inconsistent because there is no consensus over messages. A message may be broadcast but not received by some nodes. Does this happen often? Can and should we do something about it?
+
+## Desired behavior
+
+The desired behavior of a Store-server node is to store all non-ephemeral messages forever.
+
+TODO: address the obvious concern that storing everything forever is unsustainable. Should there be some cut-off time after which old messages are no longer stored?
+
+Let's say, a client issues a request to the server.
+We want the following to happen:
+
+- the server responds quickly;
+- all the messages in the response are relevant;
+- the response contains only relevant messages.
+
+TODO: is this the full definition of the desired behavior?
+
+### RLN as a proxy metric of message relevance
+
+RLN (rate limiting nullifiers) is a method of spam prevention in Relay.
+The message sender generates a proof of enrollment in some membership set.
+Multiple proofs generated within one epoch lead to punishment.
+This technique limits the message rate from each node to at most one message per epoch.
+See also: https://rfc.vac.dev/spec/17/
+
+In the i13n context, we can't prove whether a message has indeed been broadcast in the past.
+Instead, we use RLN proofs as a proxy metric.
+A valid RLN proof signifies that the message has been generated by a node with an active membership during a particular eposh.
+TODO: make sure the above is correct: what exactly does RLN prove?
+
+## Deviations from the desired behavior
+
+There are multiple ways for a node to deviate from the desired behavior.
+TODO: are we talking only about the server here, or should also discuss client (e.g., DoS)?
+### Slow response
+The server takes too long to respond.
+Possible reasons:
+- the server is offline accidentally;
+- the request describes too many relevant messages (the server is overwhelmed);
+- the server is malicious and deliberately delays the response;
+- the server doesn't have some of the relevant messages and tries to request them from other nodes.
+
+### Incomplete response
+A relevant message is missing from the response.
+Possible explanations:
+- the server didn't receive the message when it was broadcast;
+- the server deliberately withholds the message.
+
+Contrary to blockchains, Relay doesn't have consensus over relayed messages.
+Therefore, it's impossible to distinguish between the two scenarios above.
+TODO: given this fact, what's the best we can aim for?
+
+### Irrelevant response
+The response contains a message that is not relevant.
+There are two scenarios here depending on whether RLN proofs are enforced.
+If RLN is not enforced, a server may insert any number or irrelevant messages into the response.
+If RLN is enforced, a server can only do so as long as it has a valid membership to generate the respective proofs.
+This doesn't eliminate the attackbut limits its consequences.
+
+TODO: what are the powers of a malicious server when it comes to generating proofs for irrelevant messages? Can the server generate proofs for past epochs?
+
+## Privacy considerations
+
+Light protocols, in general, have weaker privacy properties than P2P protocols.
+In a client-server exchange, a client wants to selectively interact with the network.
+By doing so, it often reveals what it is interested in (e.g., subscribes to particular topics).
+
+A malicious Store server can spy on a client in the following ways:
+- track what time frames a client is interested in;
+- analyze the timing of requests;
+- link requests done by the same client.
+
+TODO: expand in the context of an incentivized protocol.
+
+
+# Cost-benefit analysis
+The goal of i13n is to make nodes more likely to exhibit the desired behavior.
+An incentive scheme links the payoffs to whether nodes follow the protocol or not.
+Good behavior should be rewarded, bad behavior punished.
+
+An incentive scheme should balance the costs and benefits for a node.
+Rewards should compensate the cost of good behavior.
+Punishments should offset the benefits that bad behavior may bring.
+
+Let us analyze the costs and benefit of a server that are specific to the Store protocol:
+- storage;
+- bandwidth;
+- computation.
+
+Let us assume a constant flow of messages per epoch and a constant flow of requests for older messages.
+There are two processes: storing incoming messages, and serving old messages to clients.
+
+The cost of storing incoming messages for one epoch is composed of:
+- storage:
+	- storage costs of all older messages: proportional to cumulative (message size x time stored);
+	- storage costs of newly arrived messages: proportional to message size;
+	- a constant cost for I/O operations (storing new messages);
+- bandwidth (download) for receiving new messages: proportional to the total size of incoming messages per epoch;
+- computational costs of receiving and storing new messages.
+
+(Strictly speaking, the I/O cost may not always be constant due to caching, disk fragmentation, etc.)
+
+The cost of storing messages to clients, per epoch, is composed of:
+- storage: none (it's accounted for as storing cost);
+- bandwidth
+	- upload: proportional to (number of clients) x (length of time frame requested) x (message size);
+	- download: proportional to the number of requests;
+- computational cost of handling requests.
+
+TODO: write this down mathematically.
+
+Storage is likely the dominating cost.
+Storage costs is proportional to the amount of information stored and the time it is stored for.
+A cumulative cost of storing a single message grows linearly with time.
+Assuming a constant stream of new messages, the total storage cost is quadratic in time.
+
+The number of messages in a response may be approximated by the length of the time frame requested.
+This assumes that messages are broadcast in the Relay network at a constant rate.
+
+Computation: the server spends computing cycles while handling requests.
+This costs likely depends not only on the computation itself, but also at the database structure.
+For example, retrieving old or rarely requested messages from the local database may be more expensive than fresh or popular ones due to caching.
+
+TODO: In file storage, I store a file and I pay for the ability to query it later. In Store, Alice relays a message, a server stores is, and later Bob queries it (and pays for it under an i13n scheme). Is there a mismatch between who incurs costs and who pays for it? Shall we think of ways to make Alice incur some costs too? See: https://github.com/waku-org/research/issues/32
+
+# Incentivization tools
+
+We can think of incentivization tools as a two-by-two matrix:
+- rewards vs punishment;
+- monetary vs reputation.
+
+In other words, there are four quadrants:
+- monetary reward: the client pays the server;
+- monetary punishment: the server makes a deposit in advance and gets slashed in case of misbehavior;
+- reputation reward: the server's reputation increases if it behaves well;
+- reputation punishment: the server's reputation decreases if it behaves badly.
+
+Reputation can only work if there are tangible benefits of having a high reputation and drawbacks of having a low reputation.
+For example:
+- clients are more likely to connect to servers with high reputation;
+- clients disconnect from servers with low reputation.
+Assuming there is a monetary aspect too, low-reputation servers miss out on potential revenue or lose their deposit.
+Reputation, however, assumes ether a repeated interaction (i.e., local reputation), or some amount of trust / centralization (centrally managed rankings).
+
+Monetary i13n tools, in turn, pose a key question: how to ensure atomicity between performance and reward or punishment?
+In other words, if the client pays first, the server may take the money and not provide the servers.
+Analogously, if the payment is due after the fact, the client can refuse to pay.
+Linking payments with behavior involves a certain amount of trust as well.
+
+This issue is somewhat linked to the problem of Lightning watchtower incentivization (see https://www.talaia.watch/).
+
+A general observation: if monetary flows are dependent on events in the past, and there is no consensus on what exactly happened in the past, the scheme can be exploited.
+TODO: can we use some on-chain component here as a semi-trusted arbiter?
+
+## Payment methods
+
+What we want from a payment method (order of priority to be discussed):
+- wide distribution (many people already have it);
+- high liquidity (i.e., easy to buy or sell at a reasonable exchange rate);
+- low latency;
+- high security.
+
+Let's list all (decentralized) payment options that we have:
+- proof-of-work: outsource-able, unavailable for consumer hardware - or is it? (Equihash etc)
+- proof-of-X (storage, etc)
+- cryptocurrency:
+	- ETH
+	- a token on Ethereum (ERC20)
+	- a token on another EVM blockchain
+	- a token on an EVM-based rollup
+	- a token on a non-EVM blockchain (BTC / Lightning?)
+
+# Related work
+
+Decentralized storage is not a new idea. What is relevant for us?
+
+1. Federated real-time messaging (IRC, mailing lists). There is no "sync" in IRC; there are simply logs of prior conversations optionally hosted wherever.
+2. Centralized file storage (FTP, later Dropbox). Requires trust in availability, but not necessarily confidentiality: content can be encrypted (modulo metadata).
+3. P2P file-sharing: Napster, BitTorrent, eDonkey. The power of defaults, local reputation.
+4. Decentralized storage in the blockchain age: Storj, Sia, Filecoin, IPFS, Codex...
+
+# Future work
+
+How to generalize i13n for Store to other Waku protocols?