all the verify methods pass along the following:
- task worker
- datastore
- corrupt and verified chunks
might as well pull that out into a common type, with the added bonus of
now having a single point for construction instead of copying the
default capacaties in three different modules..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
chunk_stream one can be collapsed, since split == split_to with at set
to buffer.len() anyway.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
in most generic places. this is accompanied by a change in
RpcEnvironment to purposefully break existing call sites.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
To cater to the paranoid, a new datastore-wide setting "verify-new" is
introduced. When set, a verify job will be spawned right after a new
backup is added to the store (only verifying the added snapshot).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Avoid races when updating manifest data by flocking a lock file.
update_manifest is used to ensure updates always happen with the lock
held.
Snapshot deletion also acquires the lock, so it cannot interfere with an
outstanding manifest write.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
There's no point in having that as a seperate method, just parse the
thing into a struct and write it back out correctly.
Also makes further changes to the method simpler.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This can slow things down by a lot on setups with (relatively) high
seek time, in the order of doubling the backup times if cache isn't
populated with the last backups chunk inode info.
Effectively there's nothing known this protects us from in the
codebase. The only thing which was theorized about was the case
where a really long running backup job (over 24 hours) is still
running and writing new chunks, not indexed yet anywhere, then an
update (or manual action) triggers a reload of the proxy. There was
some theory that then a GC in the new daemon would not know about the
oldest writer in the old one, and thus use a less strict atime limit
for chunk sweeping - opening up a window for deleting chunks from the
long running backup.
But, this simply cannot happen as we have a per datastore process
wide flock, which is acquired shared by backup jobs and exclusive by
GC. In the same process GC and backup can both get it, as it has a
process locking granularity. If there's an old daemon with a writer,
that also has the lock open shared, and so no GC in the new process
can get exclusive access to it.
So, with that confirmed we have no need for a "half-assed"
verification in the backup finish step. Rather, we plan to add an
opt-in "full verify each backup on finish" option (see #2988)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
A client can omit uploading chunks in the "known_chunks" list, those
then also won't be written on the server side. Check all those chunks
mentioned in the index but not uploaded for existance and report an
error if they don't exist instead of marking a potentially broken backup
as "successful".
This is only important if the base snapshot references corrupted chunks,
but has not been negatively verified. Also, it is important to only
verify this at the end, *after* all index writers are closed, since only
then can it be guaranteed that no GC will sweep referenced chunks away.
If a chunk is found missing, also mark the previous backup with a
verification failure, since we know the missing chunk has to referenced
in it (only way it could have been inserted to known_chunks with
checked=false). This has the benefit of automatically doing a
full-upload backup if the user attempts to retry after seeing the new
error, instead of requiring a manual verify or forget.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
To prevent forgetting the base snapshot of a running backup, and catch
the case when it still happens (e.g. via manual rm) to at least error
out instead of storing a potentially invalid backup.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This reverts commit d53fbe2474.
The HashSet and "register" function are unnecessary, as we already know
which backup is the one we need to check: the last one, stored as
'last_backup'.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Commit 9fa55e09 "finish_backup: test/verify manifest at server side"
moved the finished-marking above some checks, which means if those fail
the backup would still be marked as successful on the server.
Revert that part and comment the line for the future.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This should never trigger if everything else works correctly, but it is
still a very cheap check to avoid wrongly marking a backup as "OK" when
in fact some chunks might be missing.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
To prevent a race with a background GC operation, do not allow deletion
of backups who's index might currently be referenced as the "known chunk
list" for successive backups. Otherwise the GC could delete chunks it
thinks are no longer referenced, while at the same time telling the
client that it doesn't need to upload said chunks because they already
exist.
Additionally, prevent deletion of whole backup groups, if there are
snapshots contained that appear to be currently in-progress. This is
currently unlikely to trigger, as that function is only used for sync
jobs, but it's a useful safeguard either way.
Deleting a single snapshot has a 'force' parameter, which is necessary
to allow deleting incomplete snapshots on an aborted backup. Pruning
also sets force=true to avoid the check, since it calculates which
snapshots to keep on its own.
To avoid code duplication, the is_finished method is factored out.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
And make verify_crc private for now. We always call load_from_reader() to
verify the CRC.
Also add load_chunk() to datastore.rs (from chunk_store::read_chunk())
To support incremental backups (where not all chunks are sent to the
server), a new parameter "reuse-csum" is introduced on the
"create_fixed_index" API call. When set and equal to last backups'
checksum, the backup writer clones the data from the last index of this
archive file, and only updates chunks it actually receives.
In incremental mode some checks usually done on closing an index cannot
be made, since they would be inaccurate.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
if *only* data chunks are registered (high chance during incremental
backup), then chunk_count might be one lower then upload_stat.count
because of the zero chunk being unconditionally uploaded but not used.
Thus when subtracting the two, an overflow would occur.
In general, don't let the client make the server panic, instead just set
duplicates to 0.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
we check if all dynamic_writers are closed and if the backup contains
any valid files, we can only mark the backup finished after those
checks, else the backup task gets marked as OK, even though it
is not finished and no cleanups run
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>