proxmox-backup

Author	SHA1	Message	Date
Fabian Ebner	e9b9f33aee	rest server: daemon: update PID file before sending MAINPID notification There is a race upon reload, where it can happen that: 1. systemd forks off /bin/kill -HUP $MAINPID 2. Current instance forks off new one and notifies systemd with the new MAINPID. 3. systemd sets new MAINPID. 4. systemd receives SIGCHLD for the kill process (which is the current control process for the service) and reads the PID of the old instance from the PID file, resetting MAINPID to the PID of the old instance. 5. Old instance exits. 6. systemd receives SIGCHLD for the old instance, reads the PID of the old instance from the PID file once more. systemd sees that the MAINPID matches the child PID and considers the service exited. 7. systemd receivese notification from the new PID and is confused. The service won't get active, because the notification wasn't handled. To fix it, update the PID file before sending the MAINPID notification, similar to what a comment in systemd's src/core/service.c suggests: > /* Forking services may occasionally move to a new PID. > * As long as they update the PID file before exiting the old > * PID, they're fine. */ but for our Type=notify "before sending the notification" rather than "before exiting", because otherwise, the mix-up in 4. could still happen (although it might not actually be problematic without the mix-up in 6., it still seems better to avoid). Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>	2022-05-12 11:53:54 +02:00
Thomas Lamprecht	e22ad28302	GC scheduling: avoid triggering operation tracking error for upfront checks without that one gets a "failed to lookup datastore X" in the log for every datastore that is in read-only or offline maintenance mode, even if they aren't scheduled for GC anyway. Avoid that by first opening the datastore through a Lookup operation, and only re-open it as Write op once we know that GC needs to get scheduled for it. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-05-12 11:36:56 +02:00
Hannes Laimer	d4d730e589	proxy: rrd: skip update disk stats for offline datastores RDD update did not use lookup_datastore() and therefore bypassed the maintenance mode checks. This adds the needed check directly. Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>	2022-05-12 11:36:56 +02:00
Dominik Csapak	20814a3986	proxmox-backup-proxy: stop accept() loop on daemon shutdown On reload the old process hands over to the new process but needs to keep running until all its worker tasks are finished to avoid breaking a in-progress action like a xterm.js web shell or a backup creation/restore. During that wait time the receiving channel was already closed, but the TCP sockt accept listener was still left active by mistake. That paired with the `SO_REUSEPORT` being set on the underlying socket, made the kernel choose either the old or new process for new incoming connections, both still listened for them after all and reuse-port + multiple processes is often used as load-balancer mechanism. As the old proxy accepted connections but didn't process them anymore one could observer sporadic connection failures on any API call, well any new connection to the proxy, depending on which process got the it assigned. The fix is to stop accepting new connections one we shutdown, so poll the shutdown_future too during accept and just exit the accept-loop on shutdown. Note: This part of the code, nor other parts that could influence it, wasn't changed at all in recent times, so it's still unresolved for why it pops up only now. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Co-authored-by: Wolfgang Bumiller <w.bumiller@proxmox.com> [ T: add more (root cause) info and reword a bit ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-05-02 10:31:33 +02:00
Thomas Lamprecht	9531d2c570	rust fmt for pbs src Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-04-14 14:03:46 +02:00
Hannes Laimer	e9d2fc9362	datastore: add check for maintenance in lookup Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>	2022-04-12 15:29:14 +02:00
Dominik Csapak	416194d799	rest-server: add option to rotate task logs by 'max_days' instead of 'max_files' and use it with the configurable: 'task_log_max_days' of the node config Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2022-04-06 17:12:49 +02:00
Dominik Csapak	baefc29544	rest-server: cleanup_old_tasks: improve error handling by not bubbling up most errors, and continuing on. this avoids that we stop cleaning up because e.g. one directory was missing. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2022-04-06 17:10:02 +02:00
Dietmar Maurer	e705b3057f	rename cached_traffic_control.rs to traffic_control_cache.rs, improve dev docs Keep things inside crate::traffic_control_cache (do not pollute root namespace). Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2022-02-14 13:45:44 +01:00
Dominik Csapak	7b944ff11a	re-use PROXMOX_DEBUG env variable to control log level filter So that we can make 'log::debug' messages actually appear in the syslog. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-02-04 11:21:47 +01:00
Thomas Lamprecht	af35bc8b9c	proxy: refactor gui-language logic Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-02-03 13:12:02 +01:00
Thomas Lamprecht	5d74f79643	proxy: rustfmt Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-02-03 13:12:02 +01:00
Matthias Heiserer	68811af9f9	fix #3103 . node config: allow to configure default UI language This language is only used if none is set in the cookies. Signed-off-by: Matthias Heiserer <m.heiserer@proxmox.com>	2022-02-03 13:12:02 +01:00
Dominik Csapak	1993d98695	traffic-control: use SocketAddr from 'accept()' instead of getting the 'peer_addr()' from the socket. The advantage is that we must get this and thus can drop the mapping from result -> option, and can drop the testing for None and a test case Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2022-01-31 09:58:14 +01:00
Dietmar Maurer	d91a0f9fc9	Set MMAP_THRESHOLD to a fixed value (128K) glibc's malloc has a misguided heuristic to detect transient allocations that will just result in allocation sizes below 32 MiB never using mmap. That it turn means that those relatively big allocations are on the heap where cleanup and returning memory to the OS is harder to do and easier to be blocked by long living, small allocations at the top (end) of the heap. Observing the malloc size distribution in a file-level backup run: @size: [0] 14 \| \| [1] 25214 \|@@@@@ \| [2, 4) 9090 \|@ \| [4, 8) 12987 \|@@ \| [8, 16) 93453 \|@@@@@@@@@@@@@@@@@@@@ \| [16, 32) 30255 \|@@@@@@ \| [32, 64) 237445 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [64, 128) 32692 \|@@@@@@@ \| [128, 256) 22296 \|@@@@ \| [256, 512) 16177 \|@@@ \| [512, 1K) 5139 \|@ \| [1K, 2K) 3352 \| \| [2K, 4K) 214 \| \| [4K, 8K) 1568 \| \| [8K, 16K) 95 \| \| [16K, 32K) 3457 \| \| [32K, 64K) 3175 \| \| [64K, 128K) 161 \| \| [128K, 256K) 453 \| \| [256K, 512K) 93 \| \| [512K, 1M) 74 \| \| [1M, 2M) 774 \| \| [2M, 4M) 319 \| \| [4M, 8M) 700 \| \| [8M, 16M) 93 \| \| [16M, 32M) 18 \| \| We see that all allocations will be on the heap, and that while most allocations are small, the relatively few big ones will still make up most of the RSS and if blocked from being released back to the OS result in much higher peak and average usage for the program than actually required. Avoiding the "dynamic" mmap-threshold increasement algorithm and fixing it at the original default of 128 KiB reduces RSS size by factor 10-20 when running backups. As with memory mappings other mappings or the heap can never block freeing the memory fully back to the OS. But, the drawback of using mmap is more wasted space for unaligned or small allocation sizes, and the fact that the kernel allegedly zeros out the data before giving it to user space. The former doesn't really matter for us when using it only for allocations bigger than 128 KiB, and the latter is a trade-off, using 10 to 20 times less memory brings its own performance improvement possibilities for the whole system after all ;-) Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> [ Thomas: added to comment & commit message + extra-empty-line fixes ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-26 14:10:54 +01:00
Fabian Grünbichler	5ee8dd784f	ciphers: improve option naming Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-01-14 11:02:07 +01:00
Hannes Laimer	2eba3967b2	proxy: use ciphers from config if set Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>	2022-01-14 11:02:07 +01:00
Dominik Csapak	7c069e82d1	fix #3743 : extract zfs dataset io stats from /proc/spl/kstat/zfs/POOL/objset-* Recently, ZFS removed the pool global io stats from /proc/spl/kstat/zfs/POOL/io with no replacement. To gather stats about the datastores, access now the objset specific entries there. To be able to make that efficient, cache a map of dataset <-> obset ids, so that we do not have to parse all files each time. We update the cache each time we try to get the info for a dataset where we do not have a mapping. We cannot update it on datastore add/remove since that happens in the proxmox-backup daemon, while we need the info here in proxmox-backup-proxy. Sadly with this we lose the io wait metric, but it seems that this is no longer tracked in zfs at all, so nothing we can do for that. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2022-01-11 08:45:55 +01:00
Fabian Grünbichler	9a37bd6c84	tree-wide: fix needless borrows found and fixed via clippy Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2021-12-30 13:55:33 +01:00
Fabian Grünbichler	a0c69902c8	fix #3763 : disable renegotiation requires openssl crate with fix[0], like our packaged one. 0: https://github.com/sfackler/rust-openssl/pull/1584 Tested-by: Stoiko Ivanov s.ivanov@proxmox.com Reviewed-by: Stoiko Ivanov s.ivanov@proxmox.com Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2021-12-27 09:09:26 +01:00
Dominik Csapak	7549114c9f	adapt compute_next_event to new signature the 'utc' flag is now contained in the event itself and not given as a flag to 'compute_next_event' anymore Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-12-02 10:40:58 +01:00
Dominik Csapak	68b6c1202c	remove use of deprecated functions from proxmox-time Depend on proxmox-time 1.1.1 Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-12-01 07:23:18 +01:00
Dietmar Maurer	25877d05ac	update to proxmox-sys 0.2 crate - imported pbs-api-types/src/common_regex.rs from old proxmox crate - use hex crate to generate/parse hex digest - remove all reference to proxmox crate (use proxmox-sys and proxmox-serde instead) Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-11-24 10:32:27 +01:00
Dietmar Maurer	9a1b24b6b1	use new proxmox-async crate Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-11-19 18:03:22 +01:00
Dietmar Maurer	d5790a9f27	use new proxmox-sys crate Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-11-19 11:06:35 +01:00
Dietmar Maurer	15cc41b6cb	proxmox-systemd: remove crate, use new proxmox-time 1.1.0 instead Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-11-17 13:07:51 +01:00
Dietmar Maurer	a0172d766b	traffic-controls: add API/CLI to show current traffic Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-11-14 17:21:45 +01:00
Dietmar Maurer	d5f58006d3	cached_traffic_control: use ShareableRateLimit trait object	2021-11-13 17:49:38 +01:00
Dietmar Maurer	e511e0e553	proxmox-backup-proxy: implement traffic control Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-11-10 10:15:40 +01:00
Dietmar Maurer	98eb435d90	proxmox-rrd: use syncfs after writing rrd files Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>	2021-10-19 11:17:09 +02:00
Dietmar Maurer	fae4f6c509	cleanup: move rrd cache related code into extra file	2021-10-14 07:57:27 +02:00
Dietmar Maurer	1198f8d4e6	proxmox-rrd: implement new CBOR based format Storing much more data points now got get better graphs. Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-10-13 13:36:02 +02:00
Dietmar Maurer	4b709ade68	proxmox-backup-proxy: use tokio::task::spawn_blocking instead of block_in_place allow the current thread to do some other work in-between Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-10-13 13:36:02 +02:00
Dietmar Maurer	fa49d0fde9	RRD_CACHE: use a OnceCell instead of lazy_static And initialize only with proxmox-backup-proxy. Other binaries dont need it. Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-10-13 13:36:02 +02:00
Dietmar Maurer	1d44f175c6	proxmox-rrd: use a journal to reduce amount of bytes written Append pending changes in a simple text based format that allows for lockless appends as long as we stay below 4 KiB data per write. Apply the journal every 30 minutes and on daemon startup. Note that we do not ensure that the journal is synced, this is a perfomance optimization we can make as the kernel defaults to writeback in-flight data every 30s (sysctl vm/dirty_expire_centisecs) anyway, so we lose at max half a minute of data on a crash, here one should have in mind that we normally expose 1 minute as finest granularity anyway, so not really much lost. Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-10-13 13:36:02 +02:00
Dominik Csapak	75442e813e	api daemons: fix sending log-reopen command send_command serializes everything so it cannot be used to send a raw, optimized command. Normally that means we get an error like > 'unable to parse parameters (expected json object)' when used that way. Switch over to send_raw_command which does not re-serializes the command. Fixes: `45b8a032` ("refactor send_command") Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-10-11 14:35:50 +02:00
Wolfgang Bumiller	6ef1b649d9	update to first proxmox crate split Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2021-10-11 11:58:49 +02:00
Dominik Csapak	0e1edf19b1	proxmox-backup-proxy: clean up old tasks when the task log was rotated we maybe have old tasks when the task list was rotated, so clean them up Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2021-10-08 06:47:38 +02:00
Dietmar Maurer	09340f28f5	move RRD code into proxmox-rrd crate	2021-10-06 08:13:28 +02:00
Dietmar Maurer	608806e884	proxmox-rest-server: use new ServerAdapter trait instead of callbacks Async callbacks are a PITA, so we now pass a single trait object which implements check_auth and get_index.	2021-10-05 11:13:10 +02:00
Dietmar Maurer	48176b0a77	proxmox-rest-server: pass owned RestEnvironment to get_index This way we avoid pointers with lifetimes.	2021-10-05 11:12:53 +02:00
Dominik Csapak	0a6df20986	rest-server/daemon: use sd_notify_barrier for service reloading until now, we manually polled the systemd service state during a reload so that the sd_notify messages get processed in the correct order (RELOAD(old) -> MAINPID(old) -> READY(new)) with systemd >= 246 there is now 'sd_notify_barrier' which blocks until systemd processed all prior messages with that change, the daemon does not need to know the service name anymore Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-10-02 11:44:20 +02:00
Dietmar Maurer	6680878b5c	proxmox-rest-server: make get_index async	2021-10-01 09:38:10 +02:00
Dietmar Maurer	49e25688f1	rename CommandoSocket to CommandSocket	2021-09-30 12:52:35 +02:00
Dietmar Maurer	0d5d15c9d1	proxmox-rest-server: improve docs And rename enable_file_log to enable_access_log.	2021-09-30 12:29:15 +02:00
Dietmar Maurer	fd1b65cc3c	proxmox-rest-server: allow to catch SIGINT and SIGHUP separately And make ServerState private.	2021-09-30 08:41:30 +02:00
Dietmar Maurer	38da8ca1bc	proxmox-rest-server: improve logging And rename server_state_init() into catch_shutdown_and_reload_signals().	2021-09-29 14:48:46 +02:00
Dietmar Maurer	bf95fba72e	remove wrong calls to systemd_notify We alrteady call systemd_notify inside the create_service future.	2021-09-29 12:04:48 +02:00
Dietmar Maurer	d265420025	daemon: simlify code (make it easier to use)	2021-09-29 12:04:48 +02:00
Dietmar Maurer	6d5d305d9d	move src/backup/datastore.rs into pbs_datastore crate	2021-09-27 09:11:38 +02:00

1 2 3 4 5

225 Commits