89 Commits

Author SHA1 Message Date
Ivan Folgueira Bande
40b780cb0e
requirements: bump nim-waku to remove rest-private parameter
This is needed because the parameter is no longer supported from
nwaku v0.33.0 onwards.

Signed-off-by: Ivan Folgueira Bande <ivansete@status.im>
2024-09-19 16:35:06 +02:00
markoburcul
3fdecedd83
inventory: Apply updated terraform script
Update terraform script and apply it to get updated version of ansible
inventory file.

Referenced issue: https://github.com/status-im/infra-template/issues/10

Signed-off-by: markoburcul <marko@status.im>
2024-09-13 11:16:45 +02:00
Anton Iakimov
f8c8dac98d
boot,store,store-db: switch to nftables
https://github.com/status-im/infra-misc/issues/301
2024-09-12 16:40:48 +02:00
901a62f455
store,boot: bump max connections from 300 to 500
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-09-05 18:17:16 +02:00
Alexis Pentori
ecf29207bd
store-db: increase container share memory
Increase share memory to allow vacuuming

Signed-off-by: Alexis Pentori <alexis@status.im>
2024-09-05 14:40:32 +02:00
Alexis Pentori
720f663dbd
vault: adding lookup and env variables 2024-09-05 11:07:28 +02:00
5649191b4f
all: add Harbor Docker registry credentials
Otherwise hosts create a lot of `/v2/` calls that fail with 401.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-09-03 09:31:31 +02:00
Ivan Folgueira Bande
d8d4d5d890
store: debug log level for all stages
Having the log level in trace severely damaged the node's performance
and disk usage. It wasn't a great idea.

Signed-off-by: Ivan Folgueira Bande <ivansete@status.im>
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-08-29 09:17:46 +02:00
Alexis Pentori
6a730e7d4d
logrotate: update frequency
Signed-off-by: Alexis Pentori <alexis@status.im>
2024-08-28 09:47:36 +02:00
Ivan Folgueira Bande
96ec175dd6 store staging: set nim_waku_log_level to trace 2024-08-26 15:32:51 +02:00
Ivan Folgueira Bande
46146c87b0
store-db: update db settings to new hardware 16GB RAM and 8 CPUs 2024-08-16 17:30:02 +02:00
Ivan Folgueira Bande
87d2fc1605
store-db: disable autovacuum by default
The bigger table, messages, is a partitioned one where only INSERTS
happens. Furthermore, there are no dead tuples there because we
directly drop old partitions

We may need to manually perform vacuum on other tables
2024-08-16 17:29:37 +02:00
f32d99fb06
store-db: increase consul alert tresholds
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-08-11 15:24:26 +02:00
a2788b0f0b
requirements: bump nim-waku and certbot
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-08-11 15:05:03 +02:00
491b6d37b6
boot: increase consul check interval, warning threshold
To match settings for `store` nodes.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-08-11 15:00:14 +02:00
Alexis Pentori
af9162fc2e
store-db: applying postgres system setting based on the stage
Signed-off-by: Alexis Pentori <alexis@status.im>
2024-08-08 12:23:43 +02:00
Ivan Folgueira Bande
f85fc71b50
store-db add more appropriate db settings for current db hw
Signed-off-by: Ivan Folgueira Bande <ivansete@status.im>
2024-08-07 13:27:29 +02:00
stubbsta
e6b39e4b8f
store-db: add SSH access for tanya@status.im
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-08-02 09:12:59 +02:00
Hanno Cornelius
91117c823c
store-db: add SSH access for hanno@status.im
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-26 09:08:33 +02:00
eb8045326c
boot,store: fix ENRTree DNS entry for status.prod
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-25 09:14:40 +02:00
7df38c149d
rename the shards.test fleet to status.prod
https://github.com/status-im/infra-shards/issues/33

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-24 12:13:50 +02:00
55b31f42f5
all: do not send trace level logs to logstash
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-23 12:12:54 +02:00
7ef357f9e9
requirements: bump certbot and postgres-ha
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-23 10:28:15 +02:00
5591327ea3
store: lower staging retention to 75 GB to avoid alerts
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-17 09:26:42 +02:00
770dad967e
store,boot: fix name of Docker tag name
We are in the middle of renaming fleets and this will make it more
robust.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-16 16:55:32 +02:00
749c281209
store-db: fix variable name for Postgres Alter System
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-16 13:42:51 +02:00
f554fe7185
store-db: set autovacuum_work_mem to 10% of memory
We have seen host crashes caused by PostgreSQL using up all memory by
trying to run `autovacuum` workers.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-16 13:42:50 +02:00
97544ad634
store: set retention policy using size instead of time
Using time causes the DB to be filled quickly.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-09 11:16:38 +02:00
a0ad0410d9
ansible: apply roles.py fixes
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-05 11:44:42 +02:00
040b9d4949
rename shards fleet to status fleet
While also retaining the old domain names.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-03 22:00:29 +02:00
b1da421448
boot: uncomment setting for boot node key
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-07-03 01:19:08 +02:00
Ivan Folgueira Bande
062cb6d51a
set max_locks_per_transaction to 2160
We are using partitions in our postgres DBs. And we have one
partition per hour (24 partitions per day.)

The default max_locks_per_transaction value (64) can cause
"our of memory" and block issues in the DB because we use to
have more than 64 partitions.

With 2160 we aim to avoid that issue for 90 days (2160 == 90*24.)
if we consider a time retention policy of 90 days. Nevertheless,
we usually have time retention policy of 30 days in our Status fleets,
but we are just adding some extra margin.
2024-06-26 13:15:24 +02:00
a7e9cb6e30
ansible/roles.py: fix pull call to handle up-to-date repo
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-06-24 08:50:45 +02:00
9fce0e4211
ansible: add roles.py script to manage roles
https://github.com/status-im/infra-template/pull/5
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-06-13 17:17:56 +02:00
6ae82c6c09
requirements: bump postgres-ha role
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-06-13 17:17:42 +02:00
Alexis Pentori
7178fc4d83
store: set logrotate frequency to hourly
Signed-off-by: Alexis Pentori <alexis@status.im>
2024-06-10 10:31:15 +02:00
89c487dfcc
requirements: bump nim-waku to use new compose module
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-06-03 14:19:59 +02:00
Ivan Folgueira Bande
dc9b6f5a81
boot: set max msg size to 1024KiB to fit store nodes
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-06-03 10:21:08 +02:00
e1b4be4a24
store: un-command nim_waku_node_key variable
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-05-23 18:10:15 +02:00
aa3e653a53
store: lower sensitivity of consul healthchecks
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-05-12 10:30:30 +02:00
e7b1cdcb85
lookup_plugins/bitwarden: ignore stderr
Otherwise we get weird JSON parsing errors:
```
An unhandled exception occurred while running the lookup plugin 'bitwarden'.
Error was a <class 'json.decoder.JSONDecodeError'>, original message:
Extra data: line 1 column 843 (char 842). Extra data: line 1 column 843 (char 842)
```

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-05-07 14:55:51 +02:00
Anton Iakimov
f39afef54d
boot: logrotate hourly due to lots of DBG logs 2024-04-24 16:01:04 +02:00
883893f547
deploy new shards.staging fleet
https://github.com/status-im/infra-shards/issues/29

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-18 20:48:58 +01:00
4ef143ed20
ansible/main: run DB setup before node setup
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-18 20:06:16 +01:00
f116eef7ce
requirements: bump certbot to fix init
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-18 18:39:43 +01:00
3f5c9ea4cb
store: drop temporary image lock for store-02.gc
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-15 19:15:59 +01:00
3c60a6dcde
boot,store: go back to using proper deploy branches
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-15 14:27:38 +01:00
74be1115c6
boot,store: use both new and old domain names
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-15 14:27:37 +01:00
c87a3310ac
ansible/inventory: update to use status.im domain
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-14 22:57:09 +01:00
81850e6466
requirements: use full names for all roles
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-03-14 21:56:06 +01:00