Commit Graph

59 Commits

Author SHA1 Message Date
Wolfgang Bumiller
b2065dc7d2 cleanup proxmox_backup::backup module
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-08-30 14:14:04 +02:00
Wolfgang Bumiller
c23192d34e move chunk_store to pbs-datastore
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-07-07 14:37:47 +02:00
Wolfgang Bumiller
770a36e53a add pbs-tools subcrate
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-07-06 15:10:37 +02:00
Dominik Csapak
4921a411ad backup/datastore: refactor chunk inode sorting to the datastore
so that we can reuse that information

the removal of the adding to the corrupted list is ok, since
'get_chunks_in_order' returns them at the end of the list
and we do the same if the loading fails later in 'verify_index_chunks'
so we still mark them corrupt
(assuming that the load will fail if the stat does)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-06-28 12:14:52 +02:00
Hannes Laimer
037e6c0ca8 verify-job: move snapshot filter into function
preparatory steps for fixing #3459

Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
Reviewed-By: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Dominik Csapak <d.csapak@proxmox.com>
2021-06-28 11:03:44 +02:00
Thomas Lamprecht
2e1b63fb25 backup verify: do not check every loop iteration for abort/shutdown
only check every 1024'th, which is cheaper to do than a modulo, as we
can just mask the 10 least-significant-bits and check if the result
is zero.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-15 13:21:36 +02:00
Thomas Lamprecht
7b2d3a5fe9 backup verify: unify check if chunk can be skipped
This also re-checks the corrupt chunk list before actually loading a
chunk.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-15 13:21:07 +02:00
Thomas Lamprecht
26af61debc backup verify: re-check if we can skip a chunk in the actual verify loop
Fixes a non-negligible performance regression from commit
7f394c807b

While we skip known-verified chunks in the stat-and-inode-sort loop,
those are only the ones from previous indexes. If there's a repeated
chunk in one index they would get re-verified more often as required.

So, add the check again explicitly to the read+verify loop.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-15 10:00:06 +02:00
Thomas Lamprecht
2ab12cd0cb verify: add comment for inode sorting
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-14 14:39:24 +02:00
Thomas Lamprecht
c894909e17 verify: partially rust fmt
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-14 14:39:24 +02:00
Dominik Csapak
7f394c807b backup/verify: improve speed by sorting chunks by inode
before reading the chunks from disk in the order of the index file,
stat them first and sort them by inode number.

this can have a very positive impact on read speed on spinning disks,
even with the additional stat'ing of the chunks.

memory footprint should be tolerable, for 1_000_000 chunks
we need about ~16MiB of memory (Vec of 64bit position + 64bit inode)
(assuming 4MiB Chunks, such an index would reference 4TiB of data)

two small benchmarks (single spinner, ext4) here showed an improvement from
~430 seconds to ~330 seconds for a 32GiB fixed index
and from
~160 seconds to ~120 seconds for a 10GiB dynamic index

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-14 14:39:24 +02:00
Fabian Grünbichler
9c26a3d61a verify: factor out common parameters
all the verify methods pass along the following:
- task worker
- datastore
- corrupt and verified chunks

might as well pull that out into a common type, with the added bonus of
now having a single point for construction instead of copying the
default capacaties in three different modules..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-01-26 09:54:49 +01:00
Fabian Grünbichler
397356096a clippy: remove needless bool literals
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-01-20 16:23:52 +01:00
Fabian Grünbichler
7e25b9aaaa verify: use same progress as pull
percentage of verified groups, interpolating based on snapshot count
within the group. in most cases, this will also be closer to 'real'
progress since added snapshots (those which will be verified) in active
backup groups will be roughly evenly distributed, while number of total
snapshots per group will be heavily skewed towards those groups which
have existed the longest, even though most of those old snapshots will
only be re-verified very infrequently.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-12-01 06:22:55 +01:00
Fabian Grünbichler
7f3b0f67e7 remove BackupGroup::list_groups
BackupInfo::list_backup_groups is identical code-wise, and makes more
sense as entry point for listing groups.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-12-01 06:09:44 +01:00
Fabian Grünbichler
9f9a661b1a verify: cleanup logging order/messages
otherwise we end up printing warnings before the start message..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-10 14:11:36 +01:00
Fabian Grünbichler
1b1cab8321 verify: log/warn on invalid owner
in order to trigger a notification/make the problem more visible than
just in syslog.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-10 14:11:36 +01:00
Fabian Grünbichler
414c23facb fix #3060:: improve get_owner error handling
log invalid owners to system log, and continue with next group just as
if permission checks fail for the following operations:
- verify store with limited permissions
- list store groups
- list store snapshots

all other call sites either handle it correctly already (sync/pull), or
operate on a single group/snapshot and can bubble up the error.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-10 12:58:44 +01:00
Fabian Grünbichler
09f6a24078 verify: introduce & use new Datastore.Verify privilege
for verifying a whole datastore. Datastore.Backup now allows verifying
only backups owned by the triggering user.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-10-30 16:36:52 +01:00
Dietmar Maurer
d771a608f5 verify: directly pass manifest to filter function
In order to avoid loading the manifest twice during verify.
2020-10-29 07:59:19 +01:00
Thomas Lamprecht
b4b14dc16e do_verification_job: fix "never-reverify" and refactor/comment
commit a4915dfc2b made a wrong fix, as
it did not observed that the last expressions was done under the
invariant that we had a last verification result, because if none
could be loaded we already returned true (include).

It thus broke the case for "never re-verify", which is important when
using multiple schedules, a more high frequent one for new,
unverified snapshots, and a low frequency to re-verify older snapshots,
e.g., monthly.

Fix this case again, rework the code to avoid this easy to oversee
invariant. Use a nested match to better express the implication of
each setting, and add some comments.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-28 16:12:09 +01:00
Dietmar Maurer
328df3b507 verify: avoid generics and use &dyn Fn() for filter 2020-10-28 13:19:21 +01:00
Dietmar Maurer
a4915dfc2b verify: improve code reuse, fix filter function
Try to reuse verify_all_backups(), because this function has better
logging and well defined snaphot order.
2020-10-28 12:58:15 +01:00
Dietmar Maurer
1298618a83 move jobstate to server 2020-10-28 07:37:01 +01:00
Hannes Laimer
8d1beca7e8 api2: add verification admin endpoint and do_verification_job function
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
2020-10-21 12:51:35 +02:00
Stefan Reiter
bcc2880461 add verify_backup_dir_with_lock for callers already holding locks
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-20 10:49:19 +02:00
Stefan Reiter
1a374fcfd6 datastore: add manifest locking
Avoid races when updating manifest data by flocking a lock file.
update_manifest is used to ensure updates always happen with the lock
held.

Snapshot deletion also acquires the lock, so it cannot interfere with an
outstanding manifest write.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-16 09:34:12 +02:00
Stefan Reiter
883aa6d5a4 datastore: remove load_manifest_json
There's no point in having that as a seperate method, just parse the
thing into a struct and write it back out correctly.

Also makes further changes to the method simpler.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-15 07:19:32 +02:00
Stefan Reiter
bfa54f2e85 verify: acquire shared snapshot flock and skip on error
If we can't acquire a lock (either because the snapshot disappeared, it
is about to be forgotten/pruned, or it is currently still running) skip
the snapshot. Hold the lock during verification, so that it cannot be
deleted while we are still verifying.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-15 07:09:34 +02:00
Wolfgang Bumiller
f6b1d1cc66 don't require WorkerTask in backup/
To untangle the server code from the actual backup
implementation.
It would be ideal if the whole backup/ dir could become its
own crate with minimal dependencies, certainly without
depending on the actual api server. That would then also be
used more easily to create forensic tools for all the data
file types we have in the backup repositories.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-10-12 14:11:57 +02:00
Dietmar Maurer
a71bc08ff4 src/tools/parallel_handler.rs: remove lifetime hacks, require 'static
In theory, one can do std::mem::forget, and ignore the drop handler. With
the lifetime hack, this could result in a crash.

So we simply require 'static lifetime now (futures also needs that).
2020-10-01 14:52:48 +02:00
Dietmar Maurer
f21508b9e1 src/backup/verify.rs: use ParallelHandler to verify chunks 2020-09-26 11:14:37 +02:00
Dietmar Maurer
ee7a308de4 src/backup/verify.rs: cleanup use clause 2020-09-26 10:23:44 +02:00
Stefan Reiter
d10332a15d SnapshotVerifyState: use enum for state
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-09-15 13:06:04 +02:00
Dietmar Maurer
5656888cc9 verify: fix done count
We need to filter out benchmark group earlier
2020-09-10 09:06:33 +02:00
Dietmar Maurer
5fdc5a6f3d verify: skip benchmark directory 2020-09-10 08:44:18 +02:00
Dietmar Maurer
aadcc2815c cleanup rename_corrupted_chunk: avoid duplicate format macro 2020-09-08 12:29:53 +02:00
Stefan Reiter
0f3b7efa84 verify: rename corrupted chunks with .bad extension
This ensures that following backups will always upload the chunk,
thereby replacing it with a correct version again.

Format for renaming is <digest>.<counter>.bad where <counter> is used if
a chunk is found to be bad again before a GC cleans it up.

Care has been taken to deliberately only rename a chunk in conditions
where it is guaranteed to be an error in the chunk itself. Otherwise a
broken index file could lead to an unwanted mass-rename of chunks.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-09-08 12:20:57 +02:00
Stefan Reiter
7c77e2f94a verify: fix log units
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-09-08 12:10:19 +02:00
Dietmar Maurer
deef63699e verify: also fail on server shutdown 2020-09-02 09:50:17 +02:00
Dietmar Maurer
63d9aca96f verify: log progress 2020-09-02 07:43:28 +02:00
Dietmar Maurer
4f09d31085 src/backup/verify.rs: use global hashes (instead of per group)
This makes verify more predictable.
2020-09-01 13:33:04 +02:00
Dietmar Maurer
6b809ff59b src/backup/verify.rs: use separate thread to load data 2020-09-01 12:56:25 +02:00
Thomas Lamprecht
3b2046d263 save last verify result in snapshot manifest
Save the state ("ok" or "failed") and the UPID of the respective
verify task. With this we can easily allow to open the relevant task
log and show when the last verify happened.

As we already load the manifest when listing the snapshots, just add
it there directly.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-26 07:35:13 +02:00
Dietmar Maurer
7ae571e7cb verify: speedup - only verify chunks once
We need to do the check before we load the chunk.
2020-08-25 08:52:24 +02:00
Dietmar Maurer
4264c5023b verify: sort backup groups 2020-08-25 08:38:47 +02:00
Fabian Grünbichler
9a38fa29c2 verify: also check chunk CryptMode
and in-line verify_stored_chunk to avoid double-loading each chunk.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-11 09:56:20 +02:00
Fabian Grünbichler
8819d1f2f5 blobs: attempt to verify on decode when possible
regular chunks are only decoded when their contents are accessed, in
which case we need to have the key anyway and want to verify the digest.

for blobs we need to verify beforehand, since their checksums are always
calculated based on their raw content, and stored in the manifest.

manifests are also stored as blobs, but don't have a digest in the
traditional sense (they might have a signature covering parts of their
contents, but that is verified already when loading the manifest).

this commit does not cover pull/sync code which copies blobs and chunks
as-is without decoding them.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-04 07:27:56 +02:00
Dietmar Maurer
ff86ef00a7 cleanup: manifest is always CryptMode::None 2020-07-31 10:25:30 +02:00
Dominik Csapak
adfdc36936 verify: keep track and log which dirs failed the verification
so that we can print a list at the end of the worker which backups
are corrupt.

this is useful if there are many snapshots and some in between had an
error. Before this patch, the task log simply says to 'look in the logs'
but if the log is very long it makes it hard to see what exactly failed.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-07-30 09:39:37 +02:00