with the fix for #2909 (improving handling missing chunks), we
changed from bailing to warning during a garbage collection when
updating the atime of a chunk.
but, updating the atime can not only fail when the chunk is missing,
but also on other occasions, e.g. no permissions or more importantly,
no space left on the device. in that case, the atime of a valid and used
chunk cannot be updated, and the second sweep of the gc will remove that chunk.
[0] is a real world example of that happening.
instead, only warn on really missin chunks, and bail on all other
errors.
0: https://forum.proxmox.com/threads/pbs-server-full-two-days-later-almost-empty.83274/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
all the verify methods pass along the following:
- task worker
- datastore
- corrupt and verified chunks
might as well pull that out into a common type, with the added bonus of
now having a single point for construction instead of copying the
default capacaties in three different modules..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the error message don't make sense with an empty default
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
chunk_stream one can be collapsed, since split == split_to with at set
to buffer.len() anyway.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
instead of just logging the error. this should never happen in practice
unless someone is messing with the keyfile, in which case, it's better
to abort.
update tests accordingly (wrong fingerprint should fail, no fingerprint
should get the expected one).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the RSA key and the encryption key itself are hard-coded to avoid
stalling the test runs because of lack of entropy, they have no special
significance otherwise.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this fixes the issue that on some filesystems, you cannot recursively
remove a directory when you hold a lock on a file inside (e.g. nfs/cifs)
it is not really backwards compatible (so during an upgrade, there
could be two daemons have the lock), but since the locking was
broken before (see previous patch) it should not really matter
(also it seems very unlikely that someone will trigger this)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
'lock_manifest' returns a Result<File, Error> so we always got the result,
even when we did not get the lock, but we acted like we had.
bubble the locking error up
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
percentage of verified groups, interpolating based on snapshot count
within the group. in most cases, this will also be closer to 'real'
progress since added snapshots (those which will be verified) in active
backup groups will be roughly evenly distributed, while number of total
snapshots per group will be heavily skewed towards those groups which
have existed the longest, even though most of those old snapshots will
only be re-verified very infrequently.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
BackupInfo::list_backup_groups is identical code-wise, and makes more
sense as entry point for listing groups.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
WalkDir does not follow symlinks by default anyway, and this behaviour
is not documented anywhere. e.g., if a sysadmin mounts 'extra storage'
for some backup group or type (not knowing that only metadata is stored
in those directories), GC will ignore all the indices contained within
and happily garbage collect their chunks..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>