987a9e8707
* feat_: report panics to sentry * test_: sentry options, params and utils * feat_: toggle sentry with centralized metrics * test_: sentry init, report and close * refactor_: rename public api to generic * docs_: sentry * fix_: typo in internal/sentry/README.md Co-authored-by: osmaczko <33099791+osmaczko@users.noreply.github.com> * fix_: linter --------- Co-authored-by: osmaczko <33099791+osmaczko@users.noreply.github.com> |
||
---|---|---|
.. | ||
README.md | ||
options.go | ||
options_test.go | ||
params.go | ||
params_test.go | ||
sentry.go | ||
sentry_test.go | ||
stacktrace.go | ||
stacktrace_test.go |
README.md
Description
This package encapsulates Sentry integration. So far:
- only for status-go (including when running as part of desktop and mobile)
- only for panics (no other error reporting)
Sentry is only enabled for users that both:
- Opted-in for metrics
- Use builds from our release CI
🛬 Where
We use self-hosted Sentry: https://sentry.infra.status.im/
🕐 When
Which panics are reported:
- When running inside
status-desktop
/status-mobile
:- during API calls in
/mobile/status.go
- inside all goroutines
- during API calls in
- When running
status-backend
:- any panic
Which panics are NOT reported:
- When running inside
status-desktop
/status-mobile
:- during API calls in
/services/**/api.go
NOTE: These endpoints are executed throughgo-ethereum
's JSON-RPC server, which internally recovers all panics and doesn't provide any events or option to set an interceptor. The only way to catch these panics is to replace the JSON-RPC server implementation.
- during API calls in
- When running
status-go
unit tests:- any panic
NOTE: Go internally runs tests in a goroutine. The only way to catch these panics in tests is to manuallydefer sentry.Recover()
in each test. This also requires a linter (similar tolint-panics
) that checks this call is present.
This is not a priority right now, because:- We have direct access to failed tests logs, which contain the panic stacktrace.
- Panics are not expected to happen. Test must be passing to be able to merge the code. So it's only possible with a flaky test.
- This would only apply to nightly/develop jobs, as we don't gather panic reports from PR-level jobs.
- any panic
📦 What
Full list can be found in sentry.Event
.
Notes regarding identity-disclosing properties:
-
ServerName
- completely removed from all events -
Stacktrace
:- No private user paths are exposed, as we only gather reports from CI-built binaries.
-
TraceID
- so far will be unique for each eventTrace: A collection of spans representing the end-to-end journey of a request through your system that all share the same trace ID.
More details in sentry docs.
Configuration
There are 2 main tags to identify the error. The configuration is a bit complicated, but provides full information.
Parameters
Environment
Defining question | Where it is running? | ||||||||||||
Set time | - production can only be set at build time to prevent users from hacking the environment- All others can be set at runtime, because on CI we sometimes use same build for multiple environments |
||||||||||||
Expected values |
development and ci-pr are dropped, because we only want to consider panics from stable code |
Context
Defining question | What is the executable for the library? | ||||||||||||
Set time | Always at build-time | ||||||||||||
Expected values |
|
Environment variables
To cover these requirements, I added these environment variables:
Environment variable | Provide time | Description |
---|---|---|
SENTRY_DSN |
- At build time with direct call to sentry.Init - At runtime with InitializeApplication endpoint |
Sentry DSN to be used |
SENTRY_CONTEXT_NAME SENTRY_CONTEXT_VERSION |
Build time | Execution context of status-go |
SENTRY_PRODUCTION |
Build time | When true or 1 :-Defines if this is a production build -Sets environment to production -Has precedence over runtime SENTRY_ENVIRONMENT |
SENTRY_ENVIRONMENT |
Run time | Sets the environment. Has no effect when SENTRY_PRODUCTION is set |
Client integration
-
Set
SENTRY_CONTEXT_NAME
andSENTRY_CONTEXT_VERSION
at status-go build time -
Provide
sentryDSN
to theInitializeApplication
call. DSN must be kept private and will be provided by CI. Expect aSTATUS_GO_SENTRY_DSN
environment variable to be provided by CI.Why can't we consume `STATUS_GO_SENTRY_DSN` directly in status-go build?
In theory, we could. But this would require us to mix approaches of getting the env variable to the code. Right now we prefer `go:generate + go:embed` approach (e.g. https://github.com/status-im/status-go/pull/6014), but we can't do it in this case, because we must not write the DSN to a local file, which would be a bit vulnerable. And I don't want to go back to `-ldflags="-X github.com/status-im/status-go/internal/sentry.sentryDSN=$(STATUS_GO_SENTRY_DSN:v%=%)` approach.
Implementation details
- We recover from panics in here:
fcedb01316/common/utils.go (L102)
fcedb01316/mobile/callog/status_request_log.go (L69)
fcedb01316/cmd/status-backend/main.go (L40)
This covers all goroutines, because we have a linter to check that all goroutines havedefer common.LogOnPanic
. - Sentry is currently initialized in 2 places:
InitializeApplication
- covers desktop/mobile clientsfcedb01316/mobile/status.go (L105-L108)
- in
status-backend
- covers functional tests:fcedb01316/cmd/status-backend/main.go (L36-L39)