# Probelm Definition

This document is going to provide a comparison between  Tor and Waku. The ["Waht is Tor"](#what-is-tor) section presents a quick overview of the Tor, and can be skipped for those that are already familiar.  


# What is Tor

As stated in the [Tor specifications](https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n142), Tor is a distributed overlay network designed to anonymize low-latency TCP-based applications such as web browsing, secure shell, and instant messaging. Clients choose a path through the network and build a `circuit`, in which each node (or `onion router `or `OR`) in the path knows its predecessor and successor, but no other nodes in the circuit.  Traffic flowing down the circuit is sent in **fixed-size** `cells`, which are unwrapped by a symmetric key at each node (like the layers of an onion) and relayed downstream. Tor only works for TCP streams and can be used by any application with SOCKS support.

It comes in 3 main steps:
- The Tor client fetches a list of Tor nodes (Onion routers) from a directory server. Onion routers are voluntarily operated servers around the world that are listed publicly in a directory and there is no concern finding out their identity.
- The Tor client picks a random path to the destination server. The path includes three hops. The client negotiates a separate set of encryption keys for each hop along the circuit to ensure that each hop can't trace these connections as they pass through. The message is then encrypted  through **3 layers of encryption**. 
  - For example consider A sending message `M` to B through R1-R3 i.e., A → R3 -> R2 -> R1 -> B, the message `M` is encrypted with 3 keys, in 3 layers as `E(k3, E(k2, E(k1, M)))` where each key is only known to one of the intermediate nodes.
  - Routing nodes only need to know their preceding and exceeding connection, but not the sender of the message. 
  - Once a circuit has been established, many kinds of data can be exchanged and several different sorts of software applications can be deployed over the Tor network.
  - For efficiency, the Tor software uses the same circuit for connections that happen within the same ten minutes or so. Later requests are given a new circuit, to keep people from linking your earlier actions to the new ones.
  - No header is added to the message, otherwise it would be lot easier that how far the message is from its destination. 

- Tor's users employ this network by connecting through a series of **virtual tunnels** rather than making a direct connection, thus allowing both organizations and individuals to share information over public networks without compromising their privacy.
- 
- 
- Tor is the implementation of Onion routing which enables us to communicate anonymously over the internet
- The more people use the Tor network the stronger it gets. As it is easier to hide in a crowd of people that look exactly the same
- 

Tor Messages are called cells, and are each 512 bytes long 

Routers are talking to many users and as an intermediary for the conversation, like the first node, second nodes, the thrid node, the exit node, and it does not really know which one it is. 

So it is a not an easy job to correlate the traffics and figure out what someone did job to


## Security consideration
Tor is all about **Transport Security** and there is no anonymity gaurantee about the data that is sent by the user over Tor e.g., an attacker may sniff the last connection in thr Tor circuit to the destination server, and sees someone's username and password in clear. It is up to the user to use TLS or HTTPS for the connection. What is transported is exactly a https request and reply that goes through Tor instead of the ISP router.


## security features
Below is the the list of  the security features that Tor provides, however, the essence of all these features are two things:
1.  Tor hides (or make it difficult to know) the two end of communication i.e., who is talking to whom
2. It preserves meta data protection that include - Users Real identity
   - Precise location 
   - OS
   - The browser used to surf the web
This means, to make a fair comparison with waku, we need to know whether we can achieve these two major features or not and how.

- Protects against Traffic analysis by concealing headers of Internet data packets:
How does traffic analysis work? Internet data packets have two parts: a data payload and a header used for routing. The data payload is whatever is being sent, whether that's an email message, a web page, or an audio file. Even if you encrypt the data payload of your communications, traffic analysis still reveals a great deal about what you're doing and, possibly, what you're saying. That's because it focuses on the header, which discloses source, destination, size, timing, and so on.

Protecting against traffic analysis results means no one knows who you are talking to. This means:
- **BROWSE FREELY** Tor is a censorship circumvention tool, allowing its users to reach otherwise blocked destinations or content. One reason for that is the pool of volunteer-run servers known as Tor relays.
- **DEFEND AGAINST SURVEILLANCE** Tor Browser prevents someone watching your connection from **knowing what websites you visit.** All anyone monitoring your browsing habits can see is that you're using Tor.
-  **MULTI-LAYERED ENCRYPTION**: The traffic is relayed and encrypted three times as it passes over the Tor network. The network is comprised of thousands of volunteer-run servers known as Tor relays. Though, this is not exactly for the confidentiallity but more for the anonymity. The final data that is passed to the final destination may still be unencrypted and compromised privacy. 
- **BLOCK TRACKERS** Tor Browser isolates each website you visit so third-party trackers and **ads can't follow you**. Any **cookies automatically clear** when you're done browsing. Cannot compare this part with waku as I am not sure about the low level details of performing advertising in waku.
- **RESIST FINGERPRINTING** Tor Browser aims to make all users look the same, making it **difficult** for you to be fingerprinted based on **your** **browser** and **device information**.


## Security Vulnerabilities
The tor network does load sharing: to protect against DoS attack. To load a single router and anyone talking to that router will have a problem.The circuits are restablished about every 10 minutes
It is adaptive and can take a different tour


What if some of the third parties are controlling these nodes:
Maybe government agencies want to know what is going on
They control these nodes with the hope that they eventually control A and B
That is why they are called the guard nodes, because you trust them and you don't pick them randomly


- Weakness of Tor that is unsolvable: If the adversary is the front node and the exit node on the circuit, then it can figure out what is going on. - Weakness 2,end to end timing attack: Tor does not provide protection against end-to-end timing attacks: If your attacker can watch the traffic coming out of your computer, and also the traffic arriving at your chosen destination, he can use statistical analysis to discover that they are part of the same circuit.

- Weakness 2, traffic analysis: imagine you have got the time signature of the messages sent by a single client. The incomming ttraffic to the destination server will be a mess of lots of messages. But imagine that you can find the key points that match up with what I sent in. Then it can be used to deanonymize people
If messages are of the certain size with a certain tempo, and figure out that the same messages came out another side of the network

## Performance Concerns

Tor routers may be distant and you may get delay till your message get to the destination.


# Waku

For now, lets focus on the waku relay protocol which is  the transport layer.  We obviously have anonymity concerns when it comes to the filter or store protocols and all of them are discussed in their specs. In the following, I am considering nodes with relay protocol mounted and involve both as relayer and as publisher. Later, we can extend the security analysis for other types of the nodes like light nodes and also consider inclusion of other protocols.

Lets first have a look at the structure of waku messages and the headers used while transporting them using GossipSub protocol (i.e., Waku-Relay).
A waku message contains: 
- Payload which can be encrypted
- ContentTopic
- Version
- Timestamp
The waku message then resides inside the `data` field of a pubsub message with the following fields
- data
- topic
- seq#
- from
- sign
- key
In order to preserve anonymity, the relay-protocol follow strict no sign policy which means the seq#, from, sign and key fields are ommitted as they  indicate info related to the sender of the message. This also gives us some level of metadata protection, though I have to dig more into the use of IP addresses in the GossipSub protocol and make sure that the sender's IP does not get shared/used during the routing process.

In order to address anonymity, we should understand how the two ends of communication find each other? 
The current approach is that two end of communication find each other through pubsub topics. This means in order to preserve anonymity we need to hide the relation between the topics and their publishers and subscribers i.e., to preserve publisher-topic anonymity and subscriber-topic anonymity.
However, consider the case that we use distinct pubsub topics for each 1:1 or private group chats. For those specific pubsub topics, there will be a limited number of relay nodes subscribed to that pubsub topic hence identifying them would be easy. Moreover, that would be easy to find out whether there is an active conversation between two parties.

In the light of this observation, we need to have a large number of relay nodes involved in the pubsub topic over which nodes communicate. This means we need to have a  single topic that would result in many relay nodes. It is somewhat similar to what Tor requires, it says that the more Tor realyers would result in better anonymity. As such, I suggest to use waku content topic to manage direct or group messaging. 

The data that goes inside the payload may still have revealing information, like some info re the system config

You need to use **protocol-specific support software** if you don't want the sites you visit to see your identifying information. This exists in Tor browser, For example, you can use Tor Browser while browsing the web to withhold some information about your computer's configuration.

- If your attacker can watch the traffic coming out of your computer, and also the traffic arriving at your chosen destination, he can use statistical analysis to discover that they are part of the same circuit.

# Waku advantages
One potential advantage of using waku is that it is computationally lighter than Tor and does not require multiple encryption and decryption. Thi would also lower the message transmission delay.

Another advantage is the lighter key management where  the sender does not have to establish shared keys with all the intermediate routers (as apposed to the Tor).


No **end-to-end timing attack**: There is no destination in waku if topics are used deliberately and wisely. In waku, the traffic pattern at all the relay nodes that are subscribed to the same topic is identical. However, we should be aware of the fact that the number of messages that a sender sends will be evident which I believe is the same in Tor.