infra-shards

Commit Graph

Author	SHA1	Message	Date
Alexis Pentori	10457bfb7d	ansible/lookup/vault: use ansible variable Signed-off-by: Alexis Pentori <alexis@status.im>	2024-09-26 11:51:33 +02:00
Jakub Sokołowski	5096a10bb9	ansible/lookup/bitwarden: sync with template	2024-09-24 08:49:03 +02:00
Alexis Pentori	85ffda1131	vault: update bitwarden plugin Signed-off-by: Alexis Pentori <alexis@status.im>	2024-09-20 15:16:27 +02:00
Alexis Pentori	8c19ec8e40	flake: open nix flake devShell automatically with direnv (nix-direnv) see infra-templates: - 32a8552eaf0347c217fa7d80572b06d5cd90243d - a1b9500b5dcd114d45571e8122459f0e5aca9de2 Signed-off-by: Alexis Pentori <alexis@status.im>	2024-09-20 15:15:38 +02:00
Ivan Folgueira Bande	40b780cb0e	requirements: bump nim-waku to remove rest-private parameter This is needed because the parameter is no longer supported from nwaku v0.33.0 onwards. Signed-off-by: Ivan Folgueira Bande <ivansete@status.im>	2024-09-19 16:35:06 +02:00
markoburcul	3fdecedd83	inventory: Apply updated terraform script Update terraform script and apply it to get updated version of ansible inventory file. Referenced issue: https://github.com/status-im/infra-template/issues/10 Signed-off-by: markoburcul <marko@status.im>	2024-09-13 11:16:45 +02:00
Anton Iakimov	f8c8dac98d	boot,store,store-db: switch to nftables https://github.com/status-im/infra-misc/issues/301	2024-09-12 16:40:48 +02:00
Jakub Sokołowski	901a62f455	store,boot: bump max connections from 300 to 500 Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-09-05 18:17:16 +02:00
Alexis Pentori	ecf29207bd	store-db: increase container share memory Increase share memory to allow vacuuming Signed-off-by: Alexis Pentori <alexis@status.im>	2024-09-05 14:40:32 +02:00
Alexis Pentori	720f663dbd	vault: adding lookup and env variables	2024-09-05 11:07:28 +02:00
Jakub Sokołowski	5649191b4f	all: add Harbor Docker registry credentials Otherwise hosts create a lot of `/v2/` calls that fail with 401. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-09-03 09:31:31 +02:00
Ivan Folgueira Bande	d8d4d5d890	store: debug log level for all stages Having the log level in trace severely damaged the node's performance and disk usage. It wasn't a great idea. Signed-off-by: Ivan Folgueira Bande <ivansete@status.im> Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-29 09:17:46 +02:00
Jakub Sokołowski	7d43513dfe	status.staging: double size of DB hosts Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-28 15:36:23 +02:00
Jakub Sokołowski	abb457b196	status.prod: double size of all hosts before release Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-28 11:18:19 +02:00
Alexis Pentori	6a730e7d4d	logrotate: update frequency Signed-off-by: Alexis Pentori <alexis@status.im>	2024-08-28 09:47:36 +02:00
Ivan Folgueira Bande	96ec175dd6	store staging: set nim_waku_log_level to trace	2024-08-26 15:32:51 +02:00
Ivan Folgueira Bande	46146c87b0	store-db: update db settings to new hardware 16GB RAM and 8 CPUs	2024-08-16 17:30:02 +02:00
Ivan Folgueira Bande	87d2fc1605	store-db: disable autovacuum by default The bigger table, messages, is a partitioned one where only INSERTS happens. Furthermore, there are no dead tuples there because we directly drop old partitions We may need to manually perform vacuum on other tables	2024-08-16 17:29:37 +02:00
Jakub Sokołowski	fb62c2e7cf	store-db: double the size of prod DB hosts again Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-14 15:42:54 +02:00
Jakub Sokołowski	f32d99fb06	store-db: increase consul alert tresholds Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-11 15:24:26 +02:00
Jakub Sokołowski	a2788b0f0b	requirements: bump nim-waku and certbot Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-11 15:05:03 +02:00
Jakub Sokołowski	491b6d37b6	boot: increase consul check interval, warning threshold To match settings for `store` nodes. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-11 15:00:14 +02:00
Alexis Pentori	af9162fc2e	store-db: applying postgres system setting based on the stage Signed-off-by: Alexis Pentori <alexis@status.im>	2024-08-08 12:23:43 +02:00
Ivan Folgueira Bande	f85fc71b50	store-db add more appropriate db settings for current db hw Signed-off-by: Ivan Folgueira Bande <ivansete@status.im>	2024-08-07 13:27:29 +02:00
stubbsta	e6b39e4b8f	store-db: add SSH access for tanya@status.im Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-08-02 09:12:59 +02:00
Jakub Sokołowski	76788ba471	store-db: double the size of prod DB hosts Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-31 16:32:57 +02:00
Hanno Cornelius	91117c823c	store-db: add SSH access for hanno@status.im Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-26 09:08:33 +02:00
Jakub Sokołowski	eb8045326c	boot,store: fix ENRTree DNS entry for status.prod Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-25 09:14:40 +02:00
Jakub Sokołowski	b46fe5f4bf	readme: fix ENRTree DNS record for status.prod Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-24 17:04:49 +02:00
Jakub Sokołowski	7df38c149d	rename the shards.test fleet to status.prod https://github.com/status-im/infra-shards/issues/33 Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-24 12:13:50 +02:00
Jakub Sokołowski	55b31f42f5	all: do not send trace level logs to logstash Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-23 12:12:54 +02:00
Jakub Sokołowski	7ef357f9e9	requirements: bump certbot and postgres-ha Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-23 10:28:15 +02:00
Jakub Sokołowski	5591327ea3	store: lower staging retention to 75 GB to avoid alerts Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-17 09:26:42 +02:00
Jakub Sokołowski	d66bb10326	store-db: bump data volume from 300 to 310 GB Otherwise we trigger alert for lest than 15% disk space left. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-17 09:26:41 +02:00
Jakub Sokołowski	770dad967e	store,boot: fix name of Docker tag name We are in the middle of renaming fleets and this will make it more robust. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-16 16:55:32 +02:00
Jakub Sokołowski	749c281209	store-db: fix variable name for Postgres Alter System Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-16 13:42:51 +02:00
Jakub Sokołowski	f554fe7185	store-db: set autovacuum_work_mem to 10% of memory We have seen host crashes caused by PostgreSQL using up all memory by trying to run `autovacuum` workers. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-16 13:42:50 +02:00
Jakub Sokołowski	97544ad634	store: set retention policy using size instead of time Using time causes the DB to be filled quickly. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-09 11:16:38 +02:00
Jakub Sokołowski	2988df6c5b	store: bump data volume form 250 GB to 300 GB The garbage doesn't stop flowing. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-09 11:15:43 +02:00
Jakub Sokołowski	8bb033cf6c	flake: add flake.nix and lock Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-05 13:40:30 +02:00
Jakub Sokołowski	a0ad0410d9	ansible: apply roles.py fixes Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-05 11:44:42 +02:00
Jakub Sokołowski	45f83b0039	store-db: bump data volume from 150 GB to 250 GB Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-04 16:54:24 +02:00
Jakub Sokołowski	040b9d4949	rename shards fleet to status fleet While also retaining the old domain names. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-03 22:00:29 +02:00
Jakub Sokołowski	b1da421448	boot: uncomment setting for boot node key Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-07-03 01:19:08 +02:00
Ivan Folgueira Bande	062cb6d51a	set max_locks_per_transaction to 2160 We are using partitions in our postgres DBs. And we have one partition per hour (24 partitions per day.) The default max_locks_per_transaction value (64) can cause "our of memory" and block issues in the DB because we use to have more than 64 partitions. With 2160 we aim to avoid that issue for 90 days (2160 == 90*24.) if we consider a time retention policy of 90 days. Nevertheless, we usually have time retention policy of 30 days in our Status fleets, but we are just adding some extra margin.	2024-06-26 13:15:24 +02:00
Jakub Sokołowski	46c7a759b9	versions.tf: upgrade pass provider to 2.1.1 Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-06-24 13:29:35 +02:00
Jakub Sokołowski	a7e9cb6e30	ansible/roles.py: fix pull call to handle up-to-date repo Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-06-24 08:50:45 +02:00
Jakub Sokołowski	9fce0e4211	ansible: add roles.py script to manage roles https://github.com/status-im/infra-template/pull/5 Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-06-13 17:17:56 +02:00
Jakub Sokołowski	6ae82c6c09	requirements: bump postgres-ha role Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-06-13 17:17:42 +02:00
Jakub Sokołowski	a8162303d8	store-db: double size of hosts to handle big queries We've been experience extremely high average load reaching 30-40, most probably due to unoptimized queries. Doubling hosts to at least allow easier debugging of issues. Signed-off-by: Jakub Sokołowski <jakub@status.im>	2024-06-12 17:22:13 +02:00

1 2 3

115 Commits All Branches Search

115 Commits

All Branches