Debounce is “the number of successive failures
permitted before check will be marked as failed”.
It is very useful to avoid alerts on expected hiccups.
For checks whose retry logic lies in Cabot using `frequency`
(which is the case for Graphite, HTTP, and ICMP checks),
it makes sense that the debounce is about how often Cabot retried things.
For JenkinsChecks, however, we have no control over
how often Cabot checks the job. This means that even a
debounce of eg 5 can trigger an alert over 1 job failure.
A simpler implementation of this was to loop over the
recent results, count how many distinct jobs have failed,
(using the job number stored in the `status_check_result`),
and set the status to fail if this is higher than the debounce.
However, Cabot only considers the last 10 results (hardcoded value).
Since Cabot checks the job at fairly high frequency (or at least a
frequency higher than the Jenkins run frequency), this can mean
the status would switch to pass after 10 checks of a single check failure.
We thus need to enrich the StatusCheckResult data model
to store that information.
- Add field `consecutive_failures` to StatusCheckResult model
(and associated migration).
- Retrieve from Jenkins the last good build, and compute from
that the number of consecutive failures
- Also display the consecutive failures in the Check results page
Closes#537
A new endpoint to return the currently on call users
Also added a serialize method to plugins to allow plugin data (e.g.
slack alias) to get returned with the method.
This is needed when running behind a reverse proxy,
otherwise we get redirected to plain HTTP which does
not match the expected URL in the provider.
Per the python-social-auth documentation:
```
On projects behind a reverse proxy that uses HTTPS, the redirect
URIs can have the wrong schema (http:// instead of https://) if
the request lacks the appropriate headers, which might cause
errors during the auth process.
To force HTTPS in the final URIs set this setting to True
```
https://python-social-auth-docs.readthedocs.io/en/latest/configuration/settings.html
The ICMPStatusCheck was performed by shelling out
to the `ping` executable, by building a string for the
command and supplying it to `subprocess.Popen`.
This is dangerous because it allows for shell injections,
as the content of the command is partly user controlled:
one could set the instance address to `8.8.8.8; rm -rf /`,
which would be happily executed.
This is avoided by passing a list of strings to subprocess,
so that the arguments are all passed to ping. The above example
then results in `ping: bad address '8.8.8.8 ; rm -rf /'`
The issue was detected by the Bandit linter:
`B602:subprocess_popen_with_shell_equals_true`
Additionnally, we can simplify the flow by using `check_output`
while redirecting `stderr` to `STDOUT`, and catching the
`CalledProcessError` when the command fails.
This also adds some unit tests for this check.
See also: https://security.openstack.org/guidelines/dg_use-subprocess-securely.html
This has started failing with variations on:
```
OSError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/pytz-2017.2.dist-info'
```
Executing under sudo is hardly a solution, but a suggested workaround
per https://github.com/travis-ci/travis-ci/issues/1705
(and we are doing it already to install tox)
bashate is "a pep8 equivalent for bash scripts" by OpenStack.
https://docs.openstack.org/bashate/latest/
- Add tox target running bashate on the entire code-base
- Fix docker-entrypoint script accordingly
If a job has no build yet, then the check result was
`'NoneType' object has no attribute '__getitem__'`,
making it hard to understand what is going on.
Now we catch that issue early-on and raise an Exception.
All in all this is not great code - we should decide once and for all
if get_job_status is supposed to bubble up issues as return_codes
or if we can use Exceptions.
When using the HttpStatusCheck with a Cyrillic website, the match check was not working.
This is because by default Django represent model fields as UTF-8 unicode .
The library "content" is represented as string. (in python 2).
* Extract method `_check_content_pattern` from HttpStatusCheck `_run` method
* Add content conversion to unicode if needed
* Add unit tests for the new method