it seems that sometimes, the child process signal gets handled
before the parent process signal. Systemd then ignores the
childs signal (finished reloading) and only after going into
reloading state because of the parent. this will never finish.
Instead, wait for the state to change to 'reloading' after sending
that signal in the parent, an only fork afterwards. This way
we ensure that systemd knows about the reloading before actually trying
to do it.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Fabian Ebner <f.ebner@proxmox.com>
of ProcessLockSharedGuard.
We use a counter to determine if we can unlock the file again, but
we never actually decremented the writer count, so we held the
lock forever.
This fixes the issue that we could not start a garbage collect after
a reload, as long as the old process is still running, even when that
process has no active backup anymore but another long running task
(e.g. file download, terminal, etc.).
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
document all public things, add some doc links and make some
previously-public things only available for test cases or within the
crate:
previously public, now private:
- AclTreeNode::extract_user_roles (we have extract_roles())
- AclTreeNode::extract_group_roles (same)
- AclTreeNode::delete_group_role (exists on AclTree)
- AclTreeNode::delete_user_role (same)
- AclTreeNode::insert_group_role (same)
- AclTreeNode::insert_user_role (same)
- AclTree::write_config (we have save_config())
- AclTree::load (we have config()/cached_config())
previously public, now crate-internal:
- AclTree::from_raw (only used by tests)
- split_acl_path (used by some test binaries)
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
instead of just logging the error. this should never happen in practice
unless someone is messing with the keyfile, in which case, it's better
to abort.
update tests accordingly (wrong fingerprint should fail, no fingerprint
should get the expected one).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the RSA key and the encryption key itself are hard-coded to avoid
stalling the test runs because of lack of entropy, they have no special
significance otherwise.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
when restoring an encrypted key, the original one is obviously not
available to check the fingerprint with.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this fixes the issue that on some filesystems, you cannot recursively
remove a directory when you hold a lock on a file inside (e.g. nfs/cifs)
it is not really backwards compatible (so during an upgrade, there
could be two daemons have the lock), but since the locking was
broken before (see previous patch) it should not really matter
(also it seems very unlikely that someone will trigger this)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
'lock_manifest' returns a Result<File, Error> so we always got the result,
even when we did not get the lock, but we acted like we had.
bubble the locking error up
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if no groups were found, the task log was very confusing as it
contained no real information why nothing was synced, e.g.:
Starting datastore sync job 'remote:datastore:local-datastore:s-79412799-e6ee'
Sync datastore 'local-datastore' from 'remote/datastore'
sync job 'remote:datastore:local-datastore:s-79412799-e6ee' end
TASK OK
this patch simply logs how many groups were found and are about to be synced:
Starting datastore sync job 'remote:datastore:local-datastore:s-79412799-e6ee'
Sync datastore 'local-datastore' from 'remote/datastore'
found 0 groups to sync
sync job 'remote:datastore:local-datastore:s-79412799-e6ee' end
TASK OK
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
percentage of verified groups, interpolating based on snapshot count
within the group. in most cases, this will also be closer to 'real'
progress since added snapshots (those which will be verified) in active
backup groups will be roughly evenly distributed, while number of total
snapshots per group will be heavily skewed towards those groups which
have existed the longest, even though most of those old snapshots will
only be re-verified very infrequently.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
BackupInfo::list_backup_groups is identical code-wise, and makes more
sense as entry point for listing groups.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
WalkDir does not follow symlinks by default anyway, and this behaviour
is not documented anywhere. e.g., if a sysadmin mounts 'extra storage'
for some backup group or type (not knowing that only metadata is stored
in those directories), GC will ignore all the indices contained within
and happily garbage collect their chunks..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
for safety reason, GC finds and marks all index files below the
datastore base path. as a result of regular operations, only index files
within the expected scheme of <TYPE>/<ID>/<TIMESTAMP> should exist.
add a small check + warning if the index list contains index files out
side of this expected scheme, so that an admin with shell access can
investigate.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
we have messages starting the phases anyway, and limit the number of
progress updates so that context remains available at all times.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
before adding more fields to the tuple, let's just create the struct
inside the match arms to improve readability.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
and use this information to add more information to client backup log
and guide the download manifest decision.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the errors Vec can contain failed groups as well (e.g., if a group has
no or an invalid owner).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
else users have to manually search through a potentially very long task
log to find the entries that are different.. this is the same summary
printed at the end of a manual verify task.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
from formatting functions to main function, and pass along the key data
lines instead of the full string.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this is stricter than the check that happened on manifest load, as it
also fails if the manifest is signed but we don't have a key available.
add some additional output at the start of a backup to indicate whether
a previous manifest is available to base the backup on.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
otherwise loading will run into the signature mismatch which is
technically true, but not the complete picture in this case.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
if the manifest is signed/the contained archives/blobs are encrypted.
stored in 'unprotected' area, since there is already a strong binding
between key and manifest via the signature, and this avoids breaking
backwards compatibility for a simple usability improvement.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
and set/generate it on
- key creation
- key passphrase change
- key decryption if not already set
- key encryption with master key
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
since we systemd-encode parts of the upid string, and those can contain
characters that are invalid in urls (e.g. '\'), we have to percent encode
those
add a 'percent_encode_component' helper, so that we can maybe change
the AsciiSet for all uses at the same time
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Simplify the phase 2 code by treating .bad files just like regular
chunks, with the exception of stat logging.
To facilitate, we need to touch .bad files in phase 1. We only do this
under the condition that 1) the original chunk is missing (as before),
and 2) the original chunk is still referenced somewhere (since the code
lives in the error handler for a failed chunk touch, it only gets called
for chunks we expect to be there, i.e. ones that are referenced).
Untouched they will then be cleaned up after 24 hours (or after the last
longer-running task finishes).
Reason 2) is also a fix for .bad files not being cleaned up at all if
the original is no longer referenced anywhere (e.g. a user deleting all
snapshots after seeing some corrupt chunks appear).
cond_touch_path is introduced to touch arbitrary paths in the chunk
store with the same logic as touching chunks.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
unprivileged users should only see the counts related to their part of
the datastore.
while we're at it, switch to a list groups, filter groups, count
snapshots approach (like list_snapshots) to speedup calls to this
endpoint when many unprivileged users share a datastore.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
used in the PBS GUI, but also for PVE usage queries which don't need all
the extra expensive information..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by listing groups first, then filtering, then listing group snapshots.
this cuts down the number of openat/getdirents calls for users that just
have a partial view of the datastore.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Useful to avoid the need for a long (and possibly changing) list of include-dev
options in certain situations, e.g. nested ZFS file systems. The option is
already implemented and seems to work as expected. The checks for virtual
filesystems are not affected by this option.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>