This change implements connection manager that monitors 3 types of events:
1. update of the selected mail servers
2. disconnect from a mail server
3. errors for requesting mail history
When selected mail servers provided we will try to connect with as many as possible, and later disconnect the surplus. For example if we want to connect with one mail server and 3 were selected, we try to connect with all (3), and later disconnect with 2. It will to establish connection with live mail server faster.
If mail server disconnects we will choose any other mail server from the list of selected. Unless we have only one mail server. In such case we don't have any other choice and we will leave things as is.
If request for history was expired we will disconnect such peer and try to find another one. We will follow same rules as described above.
We will have two components that will rely on this logic:
1. requesting history
If target peer is provided we will use that peer, otherwise we will request history from any selected mail server that is connected at the time of request.
2. confirmation from selected mail server
Confirmation from any selected mail server will bee used to send a feedback that envelope was sent.
I will add several extensions, but probably in separate PRs:
1. prioritize connection with mail server that was used before reboot
2. disconnect from mail servers if history request wasn't expired but failed.
3. wait some time in RequestsMessage RPC to establish connection with any mail server
Currently this feature is hidden, as certain changes will be necessary in status-react.
partially implements: https://github.com/status-im/status-go/issues/1285
This commit updates geth to 1.8.17 and adds a possibility to enable metrics during compilation time.
The cascade of issues forced us to upgrade geth to 1.8.17 in order to allow enabling metrics during compilation time. 1.8.17 introduced `NodeID` refactoring and `enode` package which affected our peers pool and integration with Discovery V5.
- Skipped keys
The purpose of limiting the number of skipped keys generated is to avoid a dos
attack whereby an attacker would send a large N, forcing the device to
compute all the keys between currentN..N .
Previously the logic for handling skipped keys was:
- If in the current receiving chain there are more than maxSkip keys,
throw an error
This is problematic as in long-lived session dropped/unreceived messages starts
piling up, eventually reaching the threshold (1000 dropped/unreceived
messages).
This logic has been changed to be more inline with signals spec, and now
it is:
- If N is > currentN + maxSkip, throw an error
The purpose of limiting the number of skipped keys stored is to avoid a dos
attack whereby an attacker would force us to store a large number of
keys, filling up our storage.
Previously the logic for handling old keys was:
- Once you have maxKeep ratchet steps, delete any key from
currentRatchet - maxKeep.
This, in combination with the maxSkip implementation, capped the number of stored keys to
maxSkip * maxKeep.
The logic has been changed to:
- Keep a maximum of MaxMessageKeysPerSession
and additionally we delete any key that has a sequence number <
currentSeqNum - maxKeep
- Version
We check now the version of the bundle so that when we get a bundle from
the same installationID with a higher version, we mark the previous
bundle as expired and use the new bundle the next time a message is sent
* Update to geth v1.8.14
* Remove patches that were merged upstream
* Apply patches before 0016
* Fix 0016 and apply it
* Apply everything else
* Pass gas limit as a second argument to simulated backend
Update vendor
Integrate rendezvous into status node
Add a test with failover using rendezvous
Use multiple servers in client
Use discovery V5 by default and test that node can be started with rendezvous discovet
Fix linter
Update rendezvous client to one with instrumented stream
Address feedback
Fix test with updated topic limits
Apply several suggestions
Change log to debug for request errors because we continue execution
Remove web3js after rebase
Update rendezvous package
* mailserver sends envelopes in descending order
* add limit value in mailserver request payload
* mailserver sends messages up to the limit specified in the request
* update Archive method to return key and error
* processRequest returns the next page cursor
* add cursor to mailserver request
* add limit and cursor to request payload
* fix request limit encoding
* wait for request completed event in TrackerSuite/TestRequestCompleted
* add cursor to mailserver response
* fix cursor position in payload
* add e2e test for mail server pagination
* validate mail server response size
* remove old limitReached var
* fix lint warnings
* add whisper patch
* fix tests after rebase
* check all return values to avoid lint warnings
* check that all messages have been retrieved after 2 paginated requests
* fix lint warnings
* rename geth patch
* merge mailserver patches into one
* add last envelope hash to mailserver response and EventEnvelopeAvailable event
* update whisper patch
* add docs to MailServerResponse
* update whisper patch
* fix tests and lint warnings
* send mailserver response data on EventMailServerRequestCompleted signal
* update tracker tests
* optimise pagination test waiting for mailserver to archive only before requesting
* rollback mailserver interface changes
* refactoring and docs changes
* fix payload size check to determine if a limit is specified
* add more docs to the processRequest method
* add constants for request payload field lengths
* add const noLimits to specify that limit=0 means no limits
- Replace deprecated common.Hex with hexutil.Encode.
- Remove upstreamed 0010-geth-17-fix-npe-in-filter-system.patch.
- Remove upstreamed 0020-discv5-metrics.patch.
- Remove upstreamed 0026-ethdb-error-deadlock.patch.
- Update goleveldb to same version used by geth 1.8.11.
- Update PublicTransactionPoolAPI.GasPrice return type to match type in internal geth interface.
* refactor TestRequestMessageFromMailboxAsync to use s.requestHistoricMessages helper
* send p2pRequestResponseCode from mailserver
* send p2p message response to after sending all historic messages
* mailserver sends `whisper.NewSentMessage` as response
* add mailserver Client and p2pRequestAckCode watchers
* send event with envelopeFeed when p2pRequestAckCode is received
* test request completed event in tracker
* rename mailserver response events and code to RequestCompleteCode
* wait for mailserver response in e2e test
* use SendHistoricMessageResponse method name for mailserver response
* fix lint warnings
* add mailserver request expiration
* send mailserver response without envelope
* add `ttl` to Request struct in shhext_requestMessages
* test that tracker calls handler.MailServerRequestExpired
* add geth patch
* rename TTL to Timeout
* split tracker.handleEvent in multiple methods
Other changes:
* needed to patch that loop implementation in Discover V5 implementation in go-ethereum,
* fixed TestStatusNodeReconnectStaticPeers,
* fixed TestBackendAccountsConcurrently.
This change makes invalidation mechanism more aggressive. With a primary goal to invalidate short living nodes faster. In current setup any node that became known in terms of discovery will stay in this state until it will fail to respond to 5 queries. Removing them earlier from a table allows to reduce latency for finding required nodes.
The second change, one adds a version for discovery, separates status dht from ethereum dht.
After we rolled out discovery it became obvious that our boot nodes became spammed with irrelevant nodes. And this made discovery process very long, for example with separate dht discovery takes ~2s, with mutual dht - it can take 1m-10m and there is still no guarantee to find a max amount of peers, cause status nodes is a very small part of whole ethereum infra.
In my understanding, we don't need to be a part of ethereum dht, and lower latency is way more important for us.
Closes: #941
Partially closes: #960 (960 requires futher investigations on devices)
Now if Add is to be called it will be called before Wait, this is achieved
by doing following:
- if cancel() gets lock first and closes channelCh before spawnSync is called
we will exit right away
- if not than we will ensure that we hold a lock until syncers are spawned
so that cancel() will be blocked for this time and it will prevent whole Terminate() from
progressing
This change adds adds an ability to use different source of time for whisper:
when envelope is created it is used to set expiry
to track when envelope needs to be expired
This time is then used to check validity of the envelope when it is received. Currently If we receive an envelope that is sent from future - peer will get disconnected. If envelope that was received has an expiry less then now it will be simply dropped, if expiry is less than now + 10*2 seconds peer will get dropped.
So, it is clear that whisper depends on time. And any time we get a skew with peers that is > 20s reliability will be grealy reduced.
In this change another source of time for whisper will be used. This time source will use ntp servers from pool.ntp.org to compute offset. When whisper queries time - this offset will be added/substracted from current time.
Query is executed every 2 mins, queries 5 different servers, cut offs min and max and the computes mean value. pool.ntp.org is resolved to different servers and according to documentation you will rarely hit the same.
Closes: #687
* Rebase on 1.8.5
* Remove outdated patches and apply all others
* Use shh_post that returns hash
* Use bloom filter for request to mailserver
* Remove tests for sending messages without subbing first
* Fix deadlock in ethdb
* Expect null if receipt is not yet created
* Subscribe to messages before sending them in whisper test
* Add default peer limits configuration
If discovery is enabled for a given cluster - we will set a default
expected number of peers for each enabled service. For example:
- if cluster is rinkeby has a discovery enabled we will
check which services are enabled
- if whisper is enabled we will set min and max limits by default
- if les is enabled and infura is not used we will set limits too
When statusd is used - configuration must be provided using configuration
supported by statusd.
* Fix deadlock in les peer set