Fixes a non-negligible performance regression from commit
7f394c807b
While we skip known-verified chunks in the stat-and-inode-sort loop,
those are only the ones from previous indexes. If there's a repeated
chunk in one index they would get re-verified more often as required.
So, add the check again explicitly to the read+verify loop.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
before reading the chunks from disk in the order of the index file,
stat them first and sort them by inode number.
this can have a very positive impact on read speed on spinning disks,
even with the additional stat'ing of the chunks.
memory footprint should be tolerable, for 1_000_000 chunks
we need about ~16MiB of memory (Vec of 64bit position + 64bit inode)
(assuming 4MiB Chunks, such an index would reference 4TiB of data)
two small benchmarks (single spinner, ext4) here showed an improvement from
~430 seconds to ~330 seconds for a 32GiB fixed index
and from
~160 seconds to ~120 seconds for a 10GiB dynamic index
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
found and semi-manually replaced by using:
codespell -L mut -L crate -i 3 -w
Mostly in comments, but also email notification and two occurrences
of misspelled 'reserved' struct member, which where not used and
cargo build did not complain about the change, soo ...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
we will reuse that later in the client, so we need it somewhere
we can use from there
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[add strongly typed ArchiveEntry and put api code into helpers.rs]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
with the fix for #2909 (improving handling missing chunks), we
changed from bailing to warning during a garbage collection when
updating the atime of a chunk.
but, updating the atime can not only fail when the chunk is missing,
but also on other occasions, e.g. no permissions or more importantly,
no space left on the device. in that case, the atime of a valid and used
chunk cannot be updated, and the second sweep of the gc will remove that chunk.
[0] is a real world example of that happening.
instead, only warn on really missin chunks, and bail on all other
errors.
0: https://forum.proxmox.com/threads/pbs-server-full-two-days-later-almost-empty.83274/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
all the verify methods pass along the following:
- task worker
- datastore
- corrupt and verified chunks
might as well pull that out into a common type, with the added bonus of
now having a single point for construction instead of copying the
default capacaties in three different modules..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the error message don't make sense with an empty default
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
chunk_stream one can be collapsed, since split == split_to with at set
to buffer.len() anyway.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
instead of just logging the error. this should never happen in practice
unless someone is messing with the keyfile, in which case, it's better
to abort.
update tests accordingly (wrong fingerprint should fail, no fingerprint
should get the expected one).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the RSA key and the encryption key itself are hard-coded to avoid
stalling the test runs because of lack of entropy, they have no special
significance otherwise.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this fixes the issue that on some filesystems, you cannot recursively
remove a directory when you hold a lock on a file inside (e.g. nfs/cifs)
it is not really backwards compatible (so during an upgrade, there
could be two daemons have the lock), but since the locking was
broken before (see previous patch) it should not really matter
(also it seems very unlikely that someone will trigger this)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
'lock_manifest' returns a Result<File, Error> so we always got the result,
even when we did not get the lock, but we acted like we had.
bubble the locking error up
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
percentage of verified groups, interpolating based on snapshot count
within the group. in most cases, this will also be closer to 'real'
progress since added snapshots (those which will be verified) in active
backup groups will be roughly evenly distributed, while number of total
snapshots per group will be heavily skewed towards those groups which
have existed the longest, even though most of those old snapshots will
only be re-verified very infrequently.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
BackupInfo::list_backup_groups is identical code-wise, and makes more
sense as entry point for listing groups.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
WalkDir does not follow symlinks by default anyway, and this behaviour
is not documented anywhere. e.g., if a sysadmin mounts 'extra storage'
for some backup group or type (not knowing that only metadata is stored
in those directories), GC will ignore all the indices contained within
and happily garbage collect their chunks..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
for safety reason, GC finds and marks all index files below the
datastore base path. as a result of regular operations, only index files
within the expected scheme of <TYPE>/<ID>/<TIMESTAMP> should exist.
add a small check + warning if the index list contains index files out
side of this expected scheme, so that an admin with shell access can
investigate.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
we have messages starting the phases anyway, and limit the number of
progress updates so that context remains available at all times.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
otherwise loading will run into the signature mismatch which is
technically true, but not the complete picture in this case.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
if the manifest is signed/the contained archives/blobs are encrypted.
stored in 'unprotected' area, since there is already a strong binding
between key and manifest via the signature, and this avoids breaking
backwards compatibility for a simple usability improvement.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
and set/generate it on
- key creation
- key passphrase change
- key decryption if not already set
- key encryption with master key
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Simplify the phase 2 code by treating .bad files just like regular
chunks, with the exception of stat logging.
To facilitate, we need to touch .bad files in phase 1. We only do this
under the condition that 1) the original chunk is missing (as before),
and 2) the original chunk is still referenced somewhere (since the code
lives in the error handler for a failed chunk touch, it only gets called
for chunks we expect to be there, i.e. ones that are referenced).
Untouched they will then be cleaned up after 24 hours (or after the last
longer-running task finishes).
Reason 2) is also a fix for .bad files not being cleaned up at all if
the original is no longer referenced anywhere (e.g. a user deleting all
snapshots after seeing some corrupt chunks appear).
cond_touch_path is introduced to touch arbitrary paths in the chunk
store with the same logic as touching chunks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
by listing groups first, then filtering, then listing group snapshots.
this cuts down the number of openat/getdirents calls for users that just
have a partial view of the datastore.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
log invalid owners to system log, and continue with next group just as
if permission checks fail for the following operations:
- verify store with limited permissions
- list store groups
- list store snapshots
all other call sites either handle it correctly already (sync/pull), or
operate on a single group/snapshot and can bubble up the error.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
in the api we use PROXMOX_SAFE_ID_REGEX for backup ids, but here
(where we use it to list them) we use a local regex
since the first is a superset of the one used here, simply extend
the local one
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
index files that were smaller than their respective header size,
would fail with
"failed to fill whole buffer"
instead now check explicitely for the size and fail with
"index too small (size)"
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
try do reduce some unecessary lines, make match arms more precise so
one can faster see what's actually happening.
Also, avoid
> return Err(format_err!(...))
stuff, just use bail!()
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
for verifying a whole datastore. Datastore.Backup now allows verifying
only backups owned by the triggering user.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
in most generic places. this is accompanied by a change in
RpcEnvironment to purposefully break existing call sites.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Make it more clear that removed files are chunks (not indexes or
something like that, user cannot know that we do not touch them here)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
fixes commit b4fb262335, which copied
over the "Removed bad files:" block, but only adapted the log text,
not the actual variable.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
saves files mtime as i64 instead of u64 which enables backup of
files with negative mtime
the catalog_decode_i64 is compatible to encoded u64 values (if < 2^63)
but not reverse, so all "old" catalogs can be read with the new
decoder, but catalogs that contain negative mtimes will decode wrongly
on older clients
also remove the arbitrary maximum value of 2^63 - 1 for
encode_u64 (we just use up to 10 bytes now) and correctly
decode them and update the comments accordingly
adds also test for i64 encode/decode and for compatibility between
u64 encode and i64 decode
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
commit a4915dfc2b made a wrong fix, as
it did not observed that the last expressions was done under the
invariant that we had a last verification result, because if none
could be loaded we already returned true (include).
It thus broke the case for "never re-verify", which is important when
using multiple schedules, a more high frequent one for new,
unverified snapshots, and a low frequency to re-verify older snapshots,
e.g., monthly.
Fix this case again, rework the code to avoid this easy to oversee
invariant. Use a nested match to better express the implication of
each setting, and add some comments.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
and load it again when opening it
this way we can persist the status of the last garbage collect across
daemon reloads and reboots
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
To cater to the paranoid, a new datastore-wide setting "verify-new" is
introduced. When set, a verify job will be spawned right after a new
backup is added to the store (only verifying the added snapshot).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Force consumers to use the lookup_datastore method instead of
potentially opening a datastore twice, and pass the config we have
already loaded into open_with_path, removing the need for open(1).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>