This fix puts an end to a saga that essentially start during the
Status Prague Meetup at the end of October 2018. At the time we were
experiencing massive issues with `Connecting...` spinners in the app in the
venue we rented. We were pulling our hairs out what to do and we could not
find the cause of the issue at the time.
Three months later I deployed the following change:
https://github.com/status-im/infra-eth-cluster/commit/63a13eed
Which used `iptables` to map the `443` port onto our `30504` Status node port
using `PREROUTING` chain and `REDIRECT` jump in order to fix issues people
have been complaining about when using WiFi networks in various venues:
https://github.com/status-im/status-react/issues/6351
Our thinking when trying to resolve the reported issue assumed that some
networks might block outgoing connections on non-standard ports other than
the usual `80`(HTTP)/`443`(HTTPS) which would disrupt Status connectivity.
While this fix could have indeed helped a few edge cases, what it really
did was cause the Status node to stop seeing actual public IPs of the clients.
But __pure accident__ this change caused the code we inherited from
`go-ethereum` implementation of DevP2P protocol to stop throttling new
incoming connections, because the IP as which they appeared was a
`172.16.0.0/12` network address of the Docker bridge.
The `go-ethereum` code used the `!netutil.IsLAN(remoteIP)` check to
avoid throttling connections from local addresses, which included the
local Docker bridge address:
https://github.com/status-im/status-go/blob/82680830/vendor/github.com/ethereum/go-ethereum/p2p/netutil/net.go#L36
The fix intended to target a small number of networks with fortified
firewall configuration accidentally resolved our issues with
`Connecting...` prompts that our application showed us en masse during
our Prauge Meetup. Part of the reason for that is that venues like that
normally give out local IP addresses and use NAT to translate them onto
the only public IP address they possess.
Since out application is supposed to be usable from within networks
behind NAT like airport WiFi networks for example, it makes no sense to
keep the inbound connection throttle time implemented in `go-ethereum`.
I'm leaving `inboundThrottleTime` in because it's used to calculate
value for `dialHistoryExpiration` in:
`vendor/github.com/ethereum/go-ethereum/p2p/dial.go`
I believe reducing that value one we deploy this change should also
increase the speed with which the Status application is able to reconnect
to a node that was temporarily unavailable, instead waiting the 5*30 seconds.
Research issue: https://github.com/status-im/infra-eth-cluster/issues/35
Signed-off-by: Jakub Sokołowski <jakub@status.im>
* add multiaccount support
add multi account ImportPrivateKey and StoreAccount
test derivation from normal keys
* add multiaccount to mobile pkg
* use multiaccount params structs from the mobile pkg
* move multiaccount tests together with the other lib tests
* fix codeclimate warning and temporarily increase methods threshold
* split library_test_utils.go to avoid linter warnings
This commit updates geth to 1.8.17 and adds a possibility to enable metrics during compilation time.
The cascade of issues forced us to upgrade geth to 1.8.17 in order to allow enabling metrics during compilation time. 1.8.17 introduced `NodeID` refactoring and `enode` package which affected our peers pool and integration with Discovery V5.
* Update to geth v1.8.14
* Remove patches that were merged upstream
* Apply patches before 0016
* Fix 0016 and apply it
* Apply everything else
* Pass gas limit as a second argument to simulated backend
* mailserver sends envelopes in descending order
* add limit value in mailserver request payload
* mailserver sends messages up to the limit specified in the request
* update Archive method to return key and error
* processRequest returns the next page cursor
* add cursor to mailserver request
* add limit and cursor to request payload
* fix request limit encoding
* wait for request completed event in TrackerSuite/TestRequestCompleted
* add cursor to mailserver response
* fix cursor position in payload
* add e2e test for mail server pagination
* validate mail server response size
* remove old limitReached var
* fix lint warnings
* add whisper patch
* fix tests after rebase
* check all return values to avoid lint warnings
* check that all messages have been retrieved after 2 paginated requests
* fix lint warnings
* rename geth patch
* merge mailserver patches into one
* add last envelope hash to mailserver response and EventEnvelopeAvailable event
* update whisper patch
* add docs to MailServerResponse
* update whisper patch
* fix tests and lint warnings
* send mailserver response data on EventMailServerRequestCompleted signal
* update tracker tests
* optimise pagination test waiting for mailserver to archive only before requesting
* rollback mailserver interface changes
* refactoring and docs changes
* fix payload size check to determine if a limit is specified
* add more docs to the processRequest method
* add constants for request payload field lengths
* add const noLimits to specify that limit=0 means no limits
- Replace deprecated common.Hex with hexutil.Encode.
- Remove upstreamed 0010-geth-17-fix-npe-in-filter-system.patch.
- Remove upstreamed 0020-discv5-metrics.patch.
- Remove upstreamed 0026-ethdb-error-deadlock.patch.
- Update goleveldb to same version used by geth 1.8.11.
- Update PublicTransactionPoolAPI.GasPrice return type to match type in internal geth interface.
* refactor TestRequestMessageFromMailboxAsync to use s.requestHistoricMessages helper
* send p2pRequestResponseCode from mailserver
* send p2p message response to after sending all historic messages
* mailserver sends `whisper.NewSentMessage` as response
* add mailserver Client and p2pRequestAckCode watchers
* send event with envelopeFeed when p2pRequestAckCode is received
* test request completed event in tracker
* rename mailserver response events and code to RequestCompleteCode
* wait for mailserver response in e2e test
* use SendHistoricMessageResponse method name for mailserver response
* fix lint warnings
* add mailserver request expiration
* send mailserver response without envelope
* add `ttl` to Request struct in shhext_requestMessages
* test that tracker calls handler.MailServerRequestExpired
* add geth patch
* rename TTL to Timeout
* split tracker.handleEvent in multiple methods
Other changes:
* needed to patch that loop implementation in Discover V5 implementation in go-ethereum,
* fixed TestStatusNodeReconnectStaticPeers,
* fixed TestBackendAccountsConcurrently.
This change makes invalidation mechanism more aggressive. With a primary goal to invalidate short living nodes faster. In current setup any node that became known in terms of discovery will stay in this state until it will fail to respond to 5 queries. Removing them earlier from a table allows to reduce latency for finding required nodes.
The second change, one adds a version for discovery, separates status dht from ethereum dht.
After we rolled out discovery it became obvious that our boot nodes became spammed with irrelevant nodes. And this made discovery process very long, for example with separate dht discovery takes ~2s, with mutual dht - it can take 1m-10m and there is still no guarantee to find a max amount of peers, cause status nodes is a very small part of whole ethereum infra.
In my understanding, we don't need to be a part of ethereum dht, and lower latency is way more important for us.
Closes: #941
Partially closes: #960 (960 requires futher investigations on devices)
Now if Add is to be called it will be called before Wait, this is achieved
by doing following:
- if cancel() gets lock first and closes channelCh before spawnSync is called
we will exit right away
- if not than we will ensure that we hold a lock until syncers are spawned
so that cancel() will be blocked for this time and it will prevent whole Terminate() from
progressing
This change adds adds an ability to use different source of time for whisper:
when envelope is created it is used to set expiry
to track when envelope needs to be expired
This time is then used to check validity of the envelope when it is received. Currently If we receive an envelope that is sent from future - peer will get disconnected. If envelope that was received has an expiry less then now it will be simply dropped, if expiry is less than now + 10*2 seconds peer will get dropped.
So, it is clear that whisper depends on time. And any time we get a skew with peers that is > 20s reliability will be grealy reduced.
In this change another source of time for whisper will be used. This time source will use ntp servers from pool.ntp.org to compute offset. When whisper queries time - this offset will be added/substracted from current time.
Query is executed every 2 mins, queries 5 different servers, cut offs min and max and the computes mean value. pool.ntp.org is resolved to different servers and according to documentation you will rarely hit the same.
Closes: #687
* Rebase on 1.8.5
* Remove outdated patches and apply all others
* Use shh_post that returns hash
* Use bloom filter for request to mailserver
* Remove tests for sending messages without subbing first
* Fix deadlock in ethdb
* Expect null if receipt is not yet created
* Subscribe to messages before sending them in whisper test