proxmox-backup

Commit Graph

Author	SHA1	Message	Date
Thomas Lamprecht	2e1b63fb25	backup verify: do not check every loop iteration for abort/shutdown only check every 1024'th, which is cheaper to do than a modulo, as we can just mask the 10 least-significant-bits and check if the result is zero. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-15 13:21:36 +02:00
Thomas Lamprecht	7b2d3a5fe9	backup verify: unify check if chunk can be skipped This also re-checks the corrupt chunk list before actually loading a chunk. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-15 13:21:07 +02:00
Thomas Lamprecht	26af61debc	backup verify: re-check if we can skip a chunk in the actual verify loop Fixes a non-negligible performance regression from commit `7f394c807b` While we skip known-verified chunks in the stat-and-inode-sort loop, those are only the ones from previous indexes. If there's a repeated chunk in one index they would get re-verified more often as required. So, add the check again explicitly to the read+verify loop. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-15 10:00:06 +02:00
Thomas Lamprecht	2ab12cd0cb	verify: add comment for inode sorting Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-14 14:39:24 +02:00
Thomas Lamprecht	c894909e17	verify: partially rust fmt Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2021-04-14 14:39:24 +02:00
Dominik Csapak	7f394c807b	backup/verify: improve speed by sorting chunks by inode before reading the chunks from disk in the order of the index file, stat them first and sort them by inode number. this can have a very positive impact on read speed on spinning disks, even with the additional stat'ing of the chunks. memory footprint should be tolerable, for 1_000_000 chunks we need about ~16MiB of memory (Vec of 64bit position + 64bit inode) (assuming 4MiB Chunks, such an index would reference 4TiB of data) two small benchmarks (single spinner, ext4) here showed an improvement from ~430 seconds to ~330 seconds for a 32GiB fixed index and from ~160 seconds to ~120 seconds for a 10GiB dynamic index Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2021-04-14 14:39:24 +02:00
Fabian Grünbichler	9c26a3d61a	verify: factor out common parameters all the verify methods pass along the following: - task worker - datastore - corrupt and verified chunks might as well pull that out into a common type, with the added bonus of now having a single point for construction instead of copying the default capacaties in three different modules.. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2021-01-26 09:54:49 +01:00
Fabian Grünbichler	397356096a	clippy: remove needless bool literals Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2021-01-20 16:23:52 +01:00
Fabian Grünbichler	7e25b9aaaa	verify: use same progress as pull percentage of verified groups, interpolating based on snapshot count within the group. in most cases, this will also be closer to 'real' progress since added snapshots (those which will be verified) in active backup groups will be roughly evenly distributed, while number of total snapshots per group will be heavily skewed towards those groups which have existed the longest, even though most of those old snapshots will only be re-verified very infrequently. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-12-01 06:22:55 +01:00
Fabian Grünbichler	7f3b0f67e7	remove BackupGroup::list_groups BackupInfo::list_backup_groups is identical code-wise, and makes more sense as entry point for listing groups. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-12-01 06:09:44 +01:00
Fabian Grünbichler	9f9a661b1a	verify: cleanup logging order/messages otherwise we end up printing warnings before the start message.. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-11-10 14:11:36 +01:00
Fabian Grünbichler	1b1cab8321	verify: log/warn on invalid owner in order to trigger a notification/make the problem more visible than just in syslog. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-11-10 14:11:36 +01:00
Fabian Grünbichler	414c23facb	fix #3060:: improve get_owner error handling log invalid owners to system log, and continue with next group just as if permission checks fail for the following operations: - verify store with limited permissions - list store groups - list store snapshots all other call sites either handle it correctly already (sync/pull), or operate on a single group/snapshot and can bubble up the error. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-11-10 12:58:44 +01:00
Fabian Grünbichler	09f6a24078	verify: introduce & use new Datastore.Verify privilege for verifying a whole datastore. Datastore.Backup now allows verifying only backups owned by the triggering user. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-10-30 16:36:52 +01:00
Dietmar Maurer	d771a608f5	verify: directly pass manifest to filter function In order to avoid loading the manifest twice during verify.	2020-10-29 07:59:19 +01:00
Thomas Lamprecht	b4b14dc16e	do_verification_job: fix "never-reverify" and refactor/comment commit `a4915dfc2b` made a wrong fix, as it did not observed that the last expressions was done under the invariant that we had a last verification result, because if none could be loaded we already returned true (include). It thus broke the case for "never re-verify", which is important when using multiple schedules, a more high frequent one for new, unverified snapshots, and a low frequency to re-verify older snapshots, e.g., monthly. Fix this case again, rework the code to avoid this easy to oversee invariant. Use a nested match to better express the implication of each setting, and add some comments. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2020-10-28 16:12:09 +01:00
Dietmar Maurer	328df3b507	verify: avoid generics and use &dyn Fn() for filter	2020-10-28 13:19:21 +01:00
Dietmar Maurer	a4915dfc2b	verify: improve code reuse, fix filter function Try to reuse verify_all_backups(), because this function has better logging and well defined snaphot order.	2020-10-28 12:58:15 +01:00
Dietmar Maurer	1298618a83	move jobstate to server	2020-10-28 07:37:01 +01:00
Hannes Laimer	8d1beca7e8	api2: add verification admin endpoint and do_verification_job function Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>	2020-10-21 12:51:35 +02:00
Stefan Reiter	bcc2880461	add verify_backup_dir_with_lock for callers already holding locks Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-10-20 10:49:19 +02:00
Stefan Reiter	1a374fcfd6	datastore: add manifest locking Avoid races when updating manifest data by flocking a lock file. update_manifest is used to ensure updates always happen with the lock held. Snapshot deletion also acquires the lock, so it cannot interfere with an outstanding manifest write. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-10-16 09:34:12 +02:00
Stefan Reiter	883aa6d5a4	datastore: remove load_manifest_json There's no point in having that as a seperate method, just parse the thing into a struct and write it back out correctly. Also makes further changes to the method simpler. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-10-15 07:19:32 +02:00
Stefan Reiter	bfa54f2e85	verify: acquire shared snapshot flock and skip on error If we can't acquire a lock (either because the snapshot disappeared, it is about to be forgotten/pruned, or it is currently still running) skip the snapshot. Hold the lock during verification, so that it cannot be deleted while we are still verifying. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-10-15 07:09:34 +02:00
Wolfgang Bumiller	f6b1d1cc66	don't require WorkerTask in backup/ To untangle the server code from the actual backup implementation. It would be ideal if the whole backup/ dir could become its own crate with minimal dependencies, certainly without depending on the actual api server. That would then also be used more easily to create forensic tools for all the data file types we have in the backup repositories. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2020-10-12 14:11:57 +02:00
Dietmar Maurer	a71bc08ff4	src/tools/parallel_handler.rs: remove lifetime hacks, require 'static In theory, one can do std::mem::forget, and ignore the drop handler. With the lifetime hack, this could result in a crash. So we simply require 'static lifetime now (futures also needs that).	2020-10-01 14:52:48 +02:00
Dietmar Maurer	f21508b9e1	src/backup/verify.rs: use ParallelHandler to verify chunks	2020-09-26 11:14:37 +02:00
Dietmar Maurer	ee7a308de4	src/backup/verify.rs: cleanup use clause	2020-09-26 10:23:44 +02:00
Stefan Reiter	d10332a15d	SnapshotVerifyState: use enum for state Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-09-15 13:06:04 +02:00
Dietmar Maurer	5656888cc9	verify: fix done count We need to filter out benchmark group earlier	2020-09-10 09:06:33 +02:00
Dietmar Maurer	5fdc5a6f3d	verify: skip benchmark directory	2020-09-10 08:44:18 +02:00
Dietmar Maurer	aadcc2815c	cleanup rename_corrupted_chunk: avoid duplicate format macro	2020-09-08 12:29:53 +02:00
Stefan Reiter	0f3b7efa84	verify: rename corrupted chunks with .bad extension This ensures that following backups will always upload the chunk, thereby replacing it with a correct version again. Format for renaming is <digest>.<counter>.bad where <counter> is used if a chunk is found to be bad again before a GC cleans it up. Care has been taken to deliberately only rename a chunk in conditions where it is guaranteed to be an error in the chunk itself. Otherwise a broken index file could lead to an unwanted mass-rename of chunks. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-09-08 12:20:57 +02:00
Stefan Reiter	7c77e2f94a	verify: fix log units Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>	2020-09-08 12:10:19 +02:00
Dietmar Maurer	deef63699e	verify: also fail on server shutdown	2020-09-02 09:50:17 +02:00
Dietmar Maurer	63d9aca96f	verify: log progress	2020-09-02 07:43:28 +02:00
Dietmar Maurer	4f09d31085	src/backup/verify.rs: use global hashes (instead of per group) This makes verify more predictable.	2020-09-01 13:33:04 +02:00
Dietmar Maurer	6b809ff59b	src/backup/verify.rs: use separate thread to load data	2020-09-01 12:56:25 +02:00
Thomas Lamprecht	3b2046d263	save last verify result in snapshot manifest Save the state ("ok" or "failed") and the UPID of the respective verify task. With this we can easily allow to open the relevant task log and show when the last verify happened. As we already load the manifest when listing the snapshots, just add it there directly. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2020-08-26 07:35:13 +02:00
Dietmar Maurer	7ae571e7cb	verify: speedup - only verify chunks once We need to do the check before we load the chunk.	2020-08-25 08:52:24 +02:00
Dietmar Maurer	4264c5023b	verify: sort backup groups	2020-08-25 08:38:47 +02:00
Fabian Grünbichler	9a38fa29c2	verify: also check chunk CryptMode and in-line verify_stored_chunk to avoid double-loading each chunk. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-08-11 09:56:20 +02:00
Fabian Grünbichler	8819d1f2f5	blobs: attempt to verify on decode when possible regular chunks are only decoded when their contents are accessed, in which case we need to have the key anyway and want to verify the digest. for blobs we need to verify beforehand, since their checksums are always calculated based on their raw content, and stored in the manifest. manifests are also stored as blobs, but don't have a digest in the traditional sense (they might have a signature covering parts of their contents, but that is verified already when loading the manifest). this commit does not cover pull/sync code which copies blobs and chunks as-is without decoding them. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2020-08-04 07:27:56 +02:00
Dietmar Maurer	ff86ef00a7	cleanup: manifest is always CryptMode::None	2020-07-31 10:25:30 +02:00
Dominik Csapak	adfdc36936	verify: keep track and log which dirs failed the verification so that we can print a list at the end of the worker which backups are corrupt. this is useful if there are many snapshots and some in between had an error. Before this patch, the task log simply says to 'look in the logs' but if the log is very long it makes it hard to see what exactly failed. Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2020-07-30 09:39:37 +02:00
Dominik Csapak	d8594d87f1	verify: keep also track of corrupt chunks so that we do not have to verify a corrupt one multiple times Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2020-07-30 09:39:37 +02:00
Dominik Csapak	f66f537da9	verify: check all chunks of an index, even if we encounter a corrupt one this makes it easier to see which chunks are corrupt (and enables us in the future to build a 'complete' list of corrupt chunks) Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2020-07-30 09:39:37 +02:00
Dietmar Maurer	2aaae9705e	src/backup/verify.rs: try to verify chunks only once We use a HashSet (per BackupGroup) to track already verified chunks.	2020-07-29 13:29:13 +02:00
Dietmar Maurer	39f18b30b6	src/backup/data_blob.rs: new load_from_reader(), which verifies the CRC And make verify_crc private for now. We always call load_from_reader() to verify the CRC. Also add load_chunk() to datastore.rs (from chunk_store::read_chunk())	2020-07-28 10:23:16 +02:00
Wolfgang Bumiller	521a0acb2e	DataStore::load_manifest: also return CryptMode Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2020-07-08 09:19:53 +02:00

1 2

54 Commits