Commit Graph

544 Commits

Author SHA1 Message Date
Dietmar Maurer
8317873c06 gc: improve percentage done logs 2020-09-02 10:04:18 +02:00
Dietmar Maurer
deef63699e verify: also fail on server shutdown 2020-09-02 09:50:17 +02:00
Dietmar Maurer
63d9aca96f verify: log progress 2020-09-02 07:43:28 +02:00
Dietmar Maurer
4f09d31085 src/backup/verify.rs: use global hashes (instead of per group)
This makes verify more predictable.
2020-09-01 13:33:04 +02:00
Dietmar Maurer
58d73ddb1d src/backup/data_blob.rs: avoid useless &, data is already a reference 2020-09-01 12:56:25 +02:00
Dietmar Maurer
6b809ff59b src/backup/verify.rs: use separate thread to load data 2020-09-01 12:56:25 +02:00
Thomas Lamprecht
49a92084a9 gc: use human readable units for summary
and avoid the "percentage done: X %" phrase

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-27 16:06:35 +02:00
Thomas Lamprecht
3b2046d263 save last verify result in snapshot manifest
Save the state ("ok" or "failed") and the UPID of the respective
verify task. With this we can easily allow to open the relevant task
log and show when the last verify happened.

As we already load the manifest when listing the snapshots, just add
it there directly.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-26 07:35:13 +02:00
Thomas Lamprecht
1ffe030123 various typo fixes
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-25 18:52:31 +02:00
Dietmar Maurer
7ae571e7cb verify: speedup - only verify chunks once
We need to do the check before we load the chunk.
2020-08-25 08:52:24 +02:00
Dietmar Maurer
4264c5023b verify: sort backup groups 2020-08-25 08:38:47 +02:00
Wolfgang Bumiller
3fa2b983c1 add methods to allocate a DynamicIndexHeader
to avoid `map_struct` which is actually unsafe because it
does not verify alignment constraints at all

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-08-17 11:50:32 +02:00
Stefan Reiter
8b5f72b176 Revert "backup: ensure base snapshots are still available after backup"
This reverts commit d53fbe2474.

The HashSet and "register" function are unnecessary, as we already know
which backup is the one we need to check: the last one, stored as
'last_backup'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-08-11 11:03:53 +02:00
Stefan Reiter
f23f75433f backup: flock snapshot on backup start
An flock on the snapshot dir itself is used in addition to the group dir
lock. The lock is used to avoid races with forget and prune, while
having more granularity than the group lock (i.e. the group lock is
necessary to prevent more than one backup per group, but the snapshot
lock still allows backups unrelated to the currently running to be
forgotten/pruned).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-08-11 11:02:21 +02:00
Stefan Reiter
6d6b4e72d3 datastore: prevent in-use deletion with locks instead of heuristic
Attempt to lock the backup directory to be deleted, if it works keep the
lock until the deletion is complete. This way we ensure that no other
locking operation (e.g. using a snapshot as base for another backup) can
happen concurrently.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-08-11 11:00:29 +02:00
Dietmar Maurer
e434258592 src/backup/backup_info.rs: remove BackupGroup lock()
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-08-11 10:58:35 +02:00
Fabian Grünbichler
882c082369 mark signed manifests as such
for less-confusing display in the web interface

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-11 09:56:53 +02:00
Fabian Grünbichler
9a38fa29c2 verify: also check chunk CryptMode
and in-line verify_stored_chunk to avoid double-loading each chunk.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-11 09:56:20 +02:00
Fabian Grünbichler
14f6c9cb8b chunk readers: ensure chunk/index CryptMode matches
an encrypted Index should never reference a plain-text chunk, and an
unencrypted Index should never reference an encrypted chunk.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-11 09:54:22 +02:00
Wolfgang Bumiller
e7cb4dc50d introduce Username, Realm and Userid api types
and begin splitting up types.rs as it has grown quite large
already

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-08-10 12:05:01 +02:00
Stefan Reiter
4dbe129284 backup: only allow finished backups as base snapshot
If the datastore holds broken backups for some reason, do not attempt to
base following snapshots on those. This would lead to an error on
/previous, leaving the client no choice but to upload all chunks, even
though there might be potential for incremental savings.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-08-07 07:32:56 +02:00
Oguz Bektas
2f57a433b1 fix #2909: handle missing chunks gracefully in garbage collection
instead of bailing and stopping the entire GC process, warn about the
missing chunks and continue.

this results in "TASK WARNINGS: X" as the status.

Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
2020-08-06 06:36:48 +02:00
Wolfgang Bumiller
98c259b4c1 remove timer and lock functions, fix building with proxmox 0.3.2
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-08-04 11:33:02 +02:00
Aaron Lauterer
d3d566f7bd GC: use time pre phase1 to calculate min_atime in phase2
Used chunks are marked in phase1 of the garbage collection process by
using the atime property. Each used chunk gets touched so that the atime
gets updated (if older than 24h, see relatime).

Should there ever be a situation in which the phase1 in the GC run needs
a very long time to finish, it could happen that the grace period
calculated in phase2 is not long enough and thus the marking of the
chunks (atime) becomes invalid. This would result in the removal of
needed chunks.

Even though the likelyhood of this happening is very low, using the
timestamp from right before phase1 is started, to calculate the grace
period in phase2 should avoid this situation.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-08-04 10:19:05 +02:00
Fabian Grünbichler
8819d1f2f5 blobs: attempt to verify on decode when possible
regular chunks are only decoded when their contents are accessed, in
which case we need to have the key anyway and want to verify the digest.

for blobs we need to verify beforehand, since their checksums are always
calculated based on their raw content, and stored in the manifest.

manifests are also stored as blobs, but don't have a digest in the
traditional sense (they might have a signature covering parts of their
contents, but that is verified already when loading the manifest).

this commit does not cover pull/sync code which copies blobs and chunks
as-is without decoding them.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-04 07:27:56 +02:00
Wolfgang Bumiller
d9b8e2c795 pxar: better error handling on extract
Errors while applying metadata will not be considered fatal
by default using `pxar extract` unless `--strict` was passed
in which case it'll bail out immediately.

It'll still return an error exit status if something had
failed along the way.

Note that most other errors will still cause it to bail out
(eg. errors creating files, or I/O errors while writing
the contents).

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-08-03 09:40:55 +02:00
Dietmar Maurer
ff86ef00a7 cleanup: manifest is always CryptMode::None 2020-07-31 10:25:30 +02:00
Dietmar Maurer
a4acb6ef84 lock_file: return std::io::Error 2020-07-31 08:53:00 +02:00
Dietmar Maurer
e443902583 src/backup/datastore.rs: add helpers to load/store manifest
We want this to modify the manifest "unprotected" data, for example
to add upload statistics, notes, ...
2020-07-31 07:45:47 +02:00
Dietmar Maurer
1fc82c41f2 src/api2/backup.rs: aquire backup lock earlier in create_locked_backup_group() 2020-07-30 11:03:05 +02:00
Dominik Csapak
adfdc36936 verify: keep track and log which dirs failed the verification
so that we can print a list at the end of the worker which backups
are corrupt.

this is useful if there are many snapshots and some in between had an
error. Before this patch, the task log simply says to 'look in the logs'
but if the log is very long it makes it hard to see what exactly failed.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-07-30 09:39:37 +02:00
Dominik Csapak
d8594d87f1 verify: keep also track of corrupt chunks
so that we do not have to verify a corrupt one multiple times

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-07-30 09:39:37 +02:00
Dominik Csapak
f66f537da9 verify: check all chunks of an index, even if we encounter a corrupt one
this makes it easier to see which chunks are corrupt
(and enables us in the future to build a 'complete' list of
corrupt chunks)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-07-30 09:39:37 +02:00
Stefan Reiter
d53fbe2474 backup: ensure base snapshots are still available after backup
This should never trigger if everything else works correctly, but it is
still a very cheap check to avoid wrongly marking a backup as "OK" when
in fact some chunks might be missing.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-30 08:28:54 +02:00
Stefan Reiter
95bda2f25d backup: use flock on backup group to forbid multiple backups at once
Multiple backups within one backup group don't really make sense, but
break all sorts of guarantees (e.g. a second backup started after a
first would use a "known-chunks" list from the previous unfinished one,
which would be empty - but using the list from the last finished one is
not a fix either, as that one could be deleted or pruned once the first
simultaneous backup is finished).

Fix it by only allowing one backup per backup group at one time. This is
done via a flock on the backup group directory, thus remaining intact
even after a reload.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-30 08:26:26 +02:00
Stefan Reiter
c9756b40d1 datastore: prevent deletion of snaps in use as "previous backup"
To prevent a race with a background GC operation, do not allow deletion
of backups who's index might currently be referenced as the "known chunk
list" for successive backups. Otherwise the GC could delete chunks it
thinks are no longer referenced, while at the same time telling the
client that it doesn't need to upload said chunks because they already
exist.

Additionally, prevent deletion of whole backup groups, if there are
snapshots contained that appear to be currently in-progress. This is
currently unlikely to trigger, as that function is only used for sync
jobs, but it's a useful safeguard either way.

Deleting a single snapshot has a 'force' parameter, which is necessary
to allow deleting incomplete snapshots on an aborted backup. Pruning
also sets force=true to avoid the check, since it calculates which
snapshots to keep on its own.

To avoid code duplication, the is_finished method is factored out.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-30 08:26:01 +02:00
Dietmar Maurer
2aaae9705e src/backup/verify.rs: try to verify chunks only once
We use a HashSet (per BackupGroup) to track already verified chunks.
2020-07-29 13:29:13 +02:00
Dietmar Maurer
39f18b30b6 src/backup/data_blob.rs: new load_from_reader(), which verifies the CRC
And make verify_crc private for now. We always call load_from_reader() to
verify the CRC.

Also add load_chunk() to datastore.rs (from chunk_store::read_chunk())
2020-07-28 10:23:16 +02:00
Dietmar Maurer
bccdc5fa04 src/backup/manifest.rs: cleanup - again, avoid recursive call to write_canonical_json
And use re-borrow instead of dyn trait casting.
2020-07-27 10:31:34 +02:00
Dietmar Maurer
0bf7ba6c92 src/backup/manifest.rs: cleanup - avoid recursive call to write_canonical_json 2020-07-27 08:48:11 +02:00
Thomas Lamprecht
3a3af6e2b6 backup manifest: make lookup_file_info public
useful to get info like, was the previous snapshot encrypted in
libproxmox-backup-qemu

Requested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-23 10:39:21 +02:00
Thomas Lamprecht
7e42ccdaf2 fixed index: chunk_from_offset: avoid slow modulo operation
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-22 17:46:07 +02:00
Stefan Reiter
e713ee5c56 remove BufferedFixedReader interface
replaced by AsyncIndexReader

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-22 17:28:49 +02:00
Stefan Reiter
ec5f9d3525 implement AsyncSeek for AsyncIndexReader
Requires updating the AsyncRead implementation to cope with byte-wise
seeks to intra-chunk positions.

Uses chunk_from_offset to get locations within chunks, but tries to
avoid it for sequential read to not reduce performance from before.

AsyncSeek needs to use the temporary seek_to_pos to avoid changing the
position in case an invalid seek is given and it needs to error in
poll_complete.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-22 17:28:49 +02:00
Stefan Reiter
d0463b67ca add and implement chunk_from_offset for IndexFile
Necessary for byte-wise seeking through chunks in an index.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-22 17:28:49 +02:00
Thomas Lamprecht
2ff4c2cd5f datastore/chunker: fix comment typos
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-22 16:12:49 +02:00
Thomas Lamprecht
c3b090ac8a backup: list images: handle walkdir error, catch "lost+found"
We support using an ext4 mountpoint directly as datastore and even do
so ourself when creating one through the disk manage code.

Such ext4 ountpoints have a lost+found directory which only root can
traverse into. As the GC list images is done as backup:backup user
walkdir gets an error.

We cannot ignore just all permission errors, as they could lead to
missing some backup indexes and thus possibly sweeping more chunks
than desired. While *normally* that should not happen through our
stack, we had already user report that they do rsyncs to move a
datastore from old to new server and got the permission wrong.

So for now be still very strict, only allow a "lost+found" directory
as immediate child of the datastore base directory, nothing else.

If deemed safe, this can always be made less strict. Possibly by
filtering the known backup-types on the highest level first.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-22 16:01:55 +02:00
Thomas Lamprecht
c47e294ea7 datastore: fix typo
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-22 15:04:14 +02:00
Fabian Grünbichler
25455bd06d fix #2871: close FDs when scanning backup group
otherwise we leak those descriptors and run into EMFILE when a backup
group contains many snapshots.

fcntl::openat and Dir::openat are not the same ;)

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-07-22 09:19:29 +02:00
Aaron Lauterer
b96b11cdb7 chunk_store: Fix typo in bail message
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-07-21 12:51:41 +02:00