We currently don't have confidence in the reliability of messaging in Status. The reasons for this lack of confidence ranges all the way from how 'read' status is displayed in the app, to trivial - and non-trivial - bugs, to bad uptime of critical infrastructure, to how push notifications are decoupled from Whisper messaging, to a fundamental lack of (understanding of) reliability guarantees Whisper protocol provides.
What all these issues have in common is that they cause the end user to not have confidence in their messages being delivered and read. This swarm aims to attack this from a user point of view. It is expected additional swarms might be spawned for more core technical areas as these are identified (e.g. network visibility or push notifications v2).
This section is based on user feedback spawned as part of 75-status-everyday (https://docs.google.com/document/d/1pkfZWxr9I0AqidEuOfogzxEXrcg6ofs2bUkh-HSKD6o/edit)
After iteration 1 is complete the swarm will meet and discuss the results of the UXR survey, send/receive ratio results, review current UX and discuss future iterations. Work will then be planned for a future iteration, or the swarm will be closed.
- 95% of a group of 100 users surveyed - who don't have additional context beyond Status providing a p2p IM capability - using the app for an extended period of time, should answer 'yes' to the question: "Do you trust Status to deliver messages for you?" (and possibly variants of this).
This is fundamentally a soft or qualitative goal. It is thus necessary but not necessarily sufficient, and additional harder numbers might be used as we develop the capability to measure this.
- Chat board was like night and day compared to previous way
- Lukasz:
- Switching to teams helped focus a lot
- Almost no Instabug reports regarding reliability recently
- Dmitry: switch to teams, focus on Beta release, reliability and performance increase
- Pedro: having a Chat team was helpful any time a high-priority bug or a question appeared. People know where to go to discuss that without cluttering #core.
- What could we do better?
- Andrea: The measuring part could have been more fleshed out, we focused more on fixing the current issues rather than fixing and preventing new ones from happening, but overall minor points compared to the benefits of the initiative.
- Chad:
- Find a better way to measure
- Still had a handful of users complaining
- Have a proactive way to spot regressions
- Lukasz:
- Maybe try to monitor on cluster side in the future?
- Some issues with Jenkins, seems to be solved now
- Pedro:
- Better tooling, eg reliable surveys + instabug
- Eric: maybe the team effort can work just as well as a swarm, we just need better control of when a swarm is ready to start
- Dmitry: reliability metrics (in-app and cluster monitoring)
- What do we want to implement/improve?
- Better monitoring and better communication between infra and chat teams