tweak

2025-02-22 08:38:18 +00:00 · 2019-10-01 13:20:10 +08:00 · 2019-10-01 13:20:10 +08:00 · 0431f5fcee
commit 0431f5fcee
parent 33266f1b67
1 changed files with 25 additions and 51 deletions
--- a/_posts/2019-10-01-remote-log.md
+++ b/_posts/2019-10-01-remote-log.md
@ -1,39 +1,23 @@
 ---
 layout: post
-name:  "Remote log"
+name:  "P2P Data Sync with a Remote Log"
 title:  "P2P Data Sync with a Remote Log"
 date:   2019-10-01 12:00:00 +0800
 author: oskarth
-published: true
+published: false
 permalink: /remote-log
 categories: research
-summary: A research log. Reliable and decentralized, pick two.
-image: /assets/img/remote_log.png
+summary: A research log. Asynchronous P2P messaging? Remote logs to the rescue!
+image: /assets/img/remote-log.png
 ---

-A big problem when doing end-to-end data sync between mobile nodes is that most
-devices are offline most of the time. With a naive approach, you quickly run
-into issues of 'ping-pong' behavior, where messages have to be constantly
-retransmitted. We saw some basic calculations of what this bandwidth multiplier
-looks like in a [previous post](https://vac.dev/p2p-data-sync-for-mobile).
+A big problem when doing end-to-end data sync between mobile nodes is that most devices are offline most of the time. With a naive approach, you quickly run into issues of 'ping-pong' behavior, where messages have to be constantly retransmitted. We saw some basic calculations of what this bandwidth multiplier looks like in a [previous post](https://vac.dev/p2p-data-sync-for-mobile).

-While you could do some background processing, this is really draining the
-battery, and on iOS these capabilities are limited. A better approach instead is
-to loosen the constraint that two nodes need to be online at the same time. How
-do we do this? There are two main approaches, one is the *store and forward
-model*, and the other is a *remote log*.
+While you could do some background processing, this is really draining the battery, and on iOS these capabilities are limited. A better approach instead is to loosen the constraint that two nodes need to be online at the same time. How do we do this? There are two main approaches, one is the *store and forward model*, and the other is a *remote log*.

-In the *store and forward* model, we use an intermediate node that forward
-messages on behalf of the recipient. In the *remote log* model, you instead
-replicate the data onto some decentralized storage, and have a mutable reference
-to the latest state, similar to DNS. While both work, the latter is somewhat
-more elegant and "pure", as it has less strict requirements of an individual
-node's uptime. Both act as a highly-available cache to smoothen over
-non-overlapping connection windows between endpoints.
+In the *store and forward* model, we use an intermediate node that forward messages on behalf of the recipient. In the *remote log* model, you instead replicate the data onto some decentralized storage, and have a mutable reference to the latest state, similar to DNS. While both work, the latter is somewhat more elegant and "pure", as it has less strict requirements of an individual node's uptime. Both act as a highly-available cache to smoothen over non-overlapping connection windows between endpoints.

-In this post we are going to describe how such a remote log schema could work.
-Specifically, how it enhances p2p data sync and takes care of the [following
-requirements](https://vac.dev/p2p-data-sync-for-mobile):
+In this post we are going to describe how such a remote log schema could work. Specifically, how it enhances p2p data sync and takes care of the [following requirements](https://vac.dev/p2p-data-sync-for-mobile):

 > 3. MUST allow for mobile-friendly usage. By mobile-friendly we mean devices
 >    that are resource restricted, mostly-offline and often changing network.
@ -45,12 +29,9 @@ requirements](https://vac.dev/p2p-data-sync-for-mobile):

 ## Remote log

-A remote log is a replication of a local log. This means a node can read data
-from a node that is offline.
+A remote log is a replication of a local log. This means a node can read data from a node that is offline.

-The spec is in an early draft stage and can be found
-[here](https://github.com/vacp2p/specs/pull/16). A very basic spike can be found
-[here](https://github.com/vacp2p/research/tree/master/remote_log).
+The spec is in an early draft stage and can be found [here](https://github.com/vacp2p/specs/pull/16). A very basic spike can be found [here](https://github.com/vacp2p/research/tree/master/remote_log). The rest of this post follows the current spec closely.

 ### Definitions

@ -170,10 +151,7 @@ The *remote log* protobuf is what is stored at the Name system.

 ### Remote log

-The remote log lets receiving nodes know what data they are missing. Depending
-on the specific requirements and capabilities of the nodes and name system, the
-information can be referred to differently. We distinguish between three rough
-modes:
+The remote log lets receiving nodes know what data they are missing. Depending on the specific requirements and capabilities of the nodes and name system, the information can be referred to differently. We distinguish between three rough modes:

 1. Fully replicated log
 2. Normal sized page with CAS mapping
@ -189,17 +167,11 @@ modes:
 | next_page   |
 ```

-Here the upper section indicates a list of ordered pairs, and the lower section
-contains the address for the next page chunk. `H1` is the native hash function,
-and `H2` is the one used by the CAS. The numbers corresponds to the messages.
-
-To indicate which CAS is used, a remote log SHOULD use a multiaddr.
+Here the upper section indicates a list of ordered pairs, and the lower section contains the address for the next page chunk. `H1` is the native hash function, and `H2` is the one used by the CAS. The numbers corresponds to the messages. To indicate which CAS is used, a remote log SHOULD use a multiaddr.

 **Embedded data:**

-A remote log MAY also choose to embed the wire payloads that corresponds to the
-native hash. This bypasses the need for a dedicated CAS and additional
-round-trips, with a trade-off in bandwidth usage.
+A remote log MAY also choose to embed the wire payloads that corresponds to the native hash. This bypasses the need for a dedicated CAS and additional round-trips, with a trade-off in bandwidth usage.

 ```
 | H1_3 | | C_3 |
@ -211,11 +183,7 @@ round-trips, with a trade-off in bandwidth usage.

 Here `C` stands for the content that would be stored at the CAS.

-Both patterns can be used in parallel, e,g. by storing the last `k` messages
-directly and use CAS pointers for the rest. Together with the `next_page` page
-semantics, this gives users flexibility in terms of bandwidth and
-latency/indirection, all the way from a simple linked list to a fully replicated
-log. The latter is useful for things like backups on durable storage.
+Both patterns can be used in parallel, e,g. by storing the last `k` messages directly and use CAS pointers for the rest. Together with the `next_page` page semantics, this gives users flexibility in terms of bandwidth and latency/indirection, all the way from a simple linked list to a fully replicated log. The latter is useful for things like backups on durable storage.

 ### Next page semantics

@ -226,11 +194,17 @@ in time.

 ### Interaction with MVDS

-TBD.
-
-<!-- TODO: Elaborate on interaction with MVDS, especially with what messages are synced, etc -->
-
+`vac.mvds.Message` payloads are the only payloads that MUST be uploaded. Other messages types MAY be uploaded, depending on the implementation.

 ## Future work

-TBD.
+The spec is still in an early draft stage, so it is expected to change. Same with the proof of concept. More work is needed on getting a fully featured proof of concept with specific CAS and NAS instances. E.g. Swarm and Swarm Feeds, or IPFS and IPNS, or something else.
+
+For data sync in general:
+- Make consistency guarantees more explicit for app developers with support for sequence numbers and DAGs, as well as the ability to send non-synced messages. E.g. ephemeral typing notifications, linear/sequential history and casual consistency/DAG history 
+- Better semantics and scalability for multi-user sync contexts, e.g. CRDTs and joining multiple logs together
+- Better usability in terms of application layer usage (data sync clients) and supporting more transports
+
+---
+
+P.S. Next week on October 10th decanus and I will be presenting Vac at [Devcon](https://devcon.org/agenda), come say hi! D.S.