ApiConfig: avoid using pbs_config::backup_user()
CommandoSocket: avoid using pbs_config::backup_user()
FileLogger: avoid using pbs_config::backup_user()
- use atomic_open_or_create_file()
Auth Trait: moved definitions to proxmox-rest-server/src/lib.rs
- removed CachedUserInfo patrameter
- return user as String (not Authid)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Factor out open_backup_lockfile() method to acquire locks owned by
user backup with permission 0660.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
checks for PRIV_DATASTORE_MODIFY, or else if the auth_id is the backup
owner, and skips the group if not.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
so that we can reuse that information
the removal of the adding to the corrupted list is ok, since
'get_chunks_in_order' returns them at the end of the list
and we do the same if the loading fails later in 'verify_index_chunks'
so we still mark them corrupt
(assuming that the load will fail if the stat does)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
when we remove a datastore via api/cli, the proxy
has sometimes leftover references to that datastore in its
DATASTORE_MAP which includes an open filehandle on the
'.lock' file
this prevents unmounting/exporting the datastore even after removal,
only a reload/restart of the proxy did help
add a command to our command socket, which removes all non
configured datastores from the map, dropping the open filehandle
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
before reading the chunks from disk in the order of the index file,
stat them first and sort them by inode number.
this can have a very positive impact on read speed on spinning disks,
even with the additional stat'ing of the chunks.
memory footprint should be tolerable, for 1_000_000 chunks
we need about ~16MiB of memory (Vec of 64bit position + 64bit inode)
(assuming 4MiB Chunks, such an index would reference 4TiB of data)
two small benchmarks (single spinner, ext4) here showed an improvement from
~430 seconds to ~330 seconds for a 32GiB fixed index
and from
~160 seconds to ~120 seconds for a 10GiB dynamic index
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
found and semi-manually replaced by using:
codespell -L mut -L crate -i 3 -w
Mostly in comments, but also email notification and two occurrences
of misspelled 'reserved' struct member, which where not used and
cargo build did not complain about the change, soo ...
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
with the fix for #2909 (improving handling missing chunks), we
changed from bailing to warning during a garbage collection when
updating the atime of a chunk.
but, updating the atime can not only fail when the chunk is missing,
but also on other occasions, e.g. no permissions or more importantly,
no space left on the device. in that case, the atime of a valid and used
chunk cannot be updated, and the second sweep of the gc will remove that chunk.
[0] is a real world example of that happening.
instead, only warn on really missin chunks, and bail on all other
errors.
0: https://forum.proxmox.com/threads/pbs-server-full-two-days-later-almost-empty.83274/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the error message don't make sense with an empty default
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
this fixes the issue that on some filesystems, you cannot recursively
remove a directory when you hold a lock on a file inside (e.g. nfs/cifs)
it is not really backwards compatible (so during an upgrade, there
could be two daemons have the lock), but since the locking was
broken before (see previous patch) it should not really matter
(also it seems very unlikely that someone will trigger this)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
'lock_manifest' returns a Result<File, Error> so we always got the result,
even when we did not get the lock, but we acted like we had.
bubble the locking error up
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
WalkDir does not follow symlinks by default anyway, and this behaviour
is not documented anywhere. e.g., if a sysadmin mounts 'extra storage'
for some backup group or type (not knowing that only metadata is stored
in those directories), GC will ignore all the indices contained within
and happily garbage collect their chunks..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
for safety reason, GC finds and marks all index files below the
datastore base path. as a result of regular operations, only index files
within the expected scheme of <TYPE>/<ID>/<TIMESTAMP> should exist.
add a small check + warning if the index list contains index files out
side of this expected scheme, so that an admin with shell access can
investigate.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
we have messages starting the phases anyway, and limit the number of
progress updates so that context remains available at all times.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Simplify the phase 2 code by treating .bad files just like regular
chunks, with the exception of stat logging.
To facilitate, we need to touch .bad files in phase 1. We only do this
under the condition that 1) the original chunk is missing (as before),
and 2) the original chunk is still referenced somewhere (since the code
lives in the error handler for a failed chunk touch, it only gets called
for chunks we expect to be there, i.e. ones that are referenced).
Untouched they will then be cleaned up after 24 hours (or after the last
longer-running task finishes).
Reason 2) is also a fix for .bad files not being cleaned up at all if
the original is no longer referenced anywhere (e.g. a user deleting all
snapshots after seeing some corrupt chunks appear).
cond_touch_path is introduced to touch arbitrary paths in the chunk
store with the same logic as touching chunks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
try do reduce some unecessary lines, make match arms more precise so
one can faster see what's actually happening.
Also, avoid
> return Err(format_err!(...))
stuff, just use bail!()
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
in most generic places. this is accompanied by a change in
RpcEnvironment to purposefully break existing call sites.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Make it more clear that removed files are chunks (not indexes or
something like that, user cannot know that we do not touch them here)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
fixes commit b4fb262335, which copied
over the "Removed bad files:" block, but only adapted the log text,
not the actual variable.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
and load it again when opening it
this way we can persist the status of the last garbage collect across
daemon reloads and reboots
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
To cater to the paranoid, a new datastore-wide setting "verify-new" is
introduced. When set, a verify job will be spawned right after a new
backup is added to the store (only verifying the added snapshot).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Force consumers to use the lookup_datastore method instead of
potentially opening a datastore twice, and pass the config we have
already loaded into open_with_path, removing the need for open(1).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Avoid races when updating manifest data by flocking a lock file.
update_manifest is used to ensure updates always happen with the lock
held.
Snapshot deletion also acquires the lock, so it cannot interfere with an
outstanding manifest write.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>