With the introduction of the new non-blocking code, events for the
peerpool might come out-of-order. That's generally not an issue, but it
made tests fail.
I have changed the code so that order is more consistent (It's still
theoretically possible that a stop signal would arrive out of order in a
real scenario, but impact is low and I don't want to change this code
too much).
There was another deadlock in the peer pool.
Because we made the event handler asynchrnous, another deadlock popped
up, as the loop locks the global peerpool lock before processing events.
But the handlers also take the global look, effectively resulting in the
same situation we had before, i.e the loop is not running.
THE LOOP MUST BE RUNNING AT ALL TIMES OTHERWISE THE SERVER HANGS.
There might be an issue on how we handle metrics, which causes the p2p
server to hang.
updateNodeMetrics calls a method on the p2p server, which
blocks until the server is available:
e60f425b45/vendor/github.com/ethereum/go-ethereum/p2p/server.go (L301)e60f425b45/vendor/github.com/ethereum/go-ethereum/p2p/server.go (L746)
If there's back-pressure on the peer event feed
e60f425b45/vendor/github.com/ethereum/go-ethereum/p2p/server.go (L783)
The event channel above might become while updateNodeMetrics
is called, which means is never consumed, the server blocks on publishing on
it, and the two will deadlock (server waits for the channel above to be consumed,
this code waits for the server to respond to peerCount, which is in the same
event loop).
Calling it in a different go-routine will allow this code to keep
processing peer added events, therefore the server will not lock and keep processing requests.
This is a bit complicated, so:
1) Peerpool was subscribing to `event.Feed`, which is a global event
emitter for ethereum.
2) The p2p.Server was publshing on `event.Feed`, this triggered in the
same routine a publish on `event.Feed`.
3) Peerpool was listening to `event.Feed`, react on it, and in the same
routine, trigger some code on p2p.Server that would publish on
`event.Feed`
This meant that if the size of the channel was unbufferred, it would deadlock, as
peerPool would not be consuming when it would publish (the same go
routine publishes and listen effectively, through a lot of indirection
and non-buffered channels, p2p.Server->event.Feed)
The channel though was a buffered channel with size 10, and this meant that most of the times is
fine.
The issue is that peerpool is not the only producer to this channel.
So it's possible that while is processing an event, the buffer would
fill up, and it would hange trying to publish, and nobody is listening
to the channel, hanging EVERYTHING.
At least that's what I think, needs to be tested, but definitely an
issue.
I kept the code changes to a minimum, this code is a bit hairy, but it's
fairly critical so I don't want to make too many changes.
We were not locking before accessing the contacts map and it would panic
in some cases.
I have changed the code to pull contacts from db so we move away from
having locks.
There seems to be an issue with version 1.3, querying for topics on
postgres returns
and error:
```
panic: pq: invalid byte sequence for encoding "UTF8"
```
Upgrading pq fixes the issue
¯\_(ツ)_/¯
Previous I added the prefix directly in `docker-image` target.
But that doesn't make sense if you override the `RELEASE_TAG`.
Signed-off-by: Jakub Sokołowski <jakub@status.im>
When an error occours (or the peer disconnect), we return from decorator
as an error is published to the chan.
There are though still 5 or 6 goroutines that want to write on that
channel and at least 3/4 of them will be hanging, leaving them stuck
publishing on the chan.
This though is probably not the cause of
https://github.com/status-im/infra-eth-cluster/issues/39
fills up the stack trace with hung go routines.
There was a bug on status-react where it would save filters that were
not listened to.
This commit adds a task to clean up those filters as they might result
in long syncing times.
This commit also returns topics/ranges/mailserves from messenger in
order to make the initialization of the app simpler and start moving
logic to status-go.
It also removes whisper from vendor.
One of the issues we noticed is that the partitioned topic
in push notification is heavy in traffic, as any user using a particular
mailserver will use that partitioned topic to register for PNs.
This commit moves from the partitioned topic to the personal topic of
the PN server, so it does not clash with other users that might happen
to have the same partitioned topic as the mailserver, resulting in long
sync times.
Another issue that will need to be addressed separately is that once you
send a message to a topic, because of the way how waku/whisper works,
you will have to register to that topic, meaning that you will receive
that data. Currently waku does not support unsubscribing from a topic
without logging in and out, so that needs also to be addressed.
When sending a message in a private group chat we send the whole history
for redundancy and allow out-of-order processing.
This can be very expensive in some chats, resulting in long delay when
sending a message and calculating the POW.
This commit improves the performance by only forwarding the events
necessary for the user to be able to construct a group chat and process
correctly the message.
This commit fixes a couple of issues:
1) Emojis were sent to any member of the group chat, regardless of
whether they joined
2) We don't want to wrap emojis, as there's no need to do so, only
messages are to be wrapped
The topics field was not passed to the mailserver, which meant that
queries were still using the old bloom filter.
Hopefully this is the last place where we need to pass this.
Using `latest` tag is dangerous for non-technical users.
And updating `latest` tag willy-nilly is also bad.
Signed-off-by: Jakub Sokołowski <jakub@status.im>
This commit re-introduces a feature that we lost during the migration to
status-go.
Messages are cached for a couple of days if processed correctly by
status-go, to avoid performance issues.
* Initial work on expanding Local Notifications
Adding functionality to support multiple notification types in Notification.Body. Currently have a bug that I think is caused by a the jsonMarshal func not working as intented, need to resolve this next before proceeding
* Fixed json.Marshaller issue and implemented json.Unmarshaller
* Tweak errors, go convention is errors don't begin with capital letters
* Added notificationMessageBody with un/marshalling
Also removed the Body interface
* Added check for bodyType mismatch
* Implement building and sending new message notifications
* Refactor to remove cycle imports
* Resolved linting issue ... Hopefully
* Resolving an implicit memory aliasing in a for loop
* version bump
* Added Notification.Category consts
Since we fixed re-sending of messages, it has been noticed a performance
degradation in private group chats.
This is due to too aggressive re-sending of messages.
This commit disables resending of private group chat messages for now
(same as 1.9, so no changes), but keeps it enabled for 1-to-1s.
In some instances the retry mechanism would get into a busy loop.
That's due to the fact that we would fetch some non-retriable
notifications but not act on them.
This commit fixes the issue by filtering them from the database query
and making sure that we at least wait 1 second.
In some instances the communities migration would be skipped but not
marked as `dirty`.
This commit addresses the issue by:
- Making sure that if dirty is set the migration is not skipped but
replayed
- If the version is on the communities migration and dirty is false, we
check for the presence of the communities table. If not present we
replay the communities migration.
- Make community_id field in user_messages nullable
It also removes all the `down` migration, as we can't use them
effectively, as explained in the README.md added.
This commit changes the behavior so that when the image is updated it
will be published on the contact code topic.
If that does not happen because we are offline, it will be scheduled for
the next time we are online.