Use timeout futures for sections that might hang in certain error
conditions. This is mostly intended to be used as a safeguard, not a
first line of defense - i.e. best-effort avoidance of total hangs.
Not every future used for the HttpClient/H2Client is changed, only those
where a quick response is to be expected. For example, the response
reading futures are left alone, so data transfer is never capped with
timeout, only the initial server connect.
It is also used for upgrading to H2 connections, as that can take a long
time on overloaded servers.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
it seems that sometimes, the child process signal gets handled
before the parent process signal. Systemd then ignores the
childs signal (finished reloading) and only after going into
reloading state because of the parent. this will never finish.
Instead, wait for the state to change to 'reloading' after sending
that signal in the parent, an only fork afterwards. This way
we ensure that systemd knows about the reloading before actually trying
to do it.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-By: Fabian Ebner <f.ebner@proxmox.com>
of ProcessLockSharedGuard.
We use a counter to determine if we can unlock the file again, but
we never actually decremented the writer count, so we held the
lock forever.
This fixes the issue that we could not start a garbage collect after
a reload, as long as the old process is still running, even when that
process has no active backup anymore but another long running task
(e.g. file download, terminal, etc.).
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
document all public things, add some doc links and make some
previously-public things only available for test cases or within the
crate:
previously public, now private:
- AclTreeNode::extract_user_roles (we have extract_roles())
- AclTreeNode::extract_group_roles (same)
- AclTreeNode::delete_group_role (exists on AclTree)
- AclTreeNode::delete_user_role (same)
- AclTreeNode::insert_group_role (same)
- AclTreeNode::insert_user_role (same)
- AclTree::write_config (we have save_config())
- AclTree::load (we have config()/cached_config())
previously public, now crate-internal:
- AclTree::from_raw (only used by tests)
- split_acl_path (used by some test binaries)
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
instead of just logging the error. this should never happen in practice
unless someone is messing with the keyfile, in which case, it's better
to abort.
update tests accordingly (wrong fingerprint should fail, no fingerprint
should get the expected one).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the RSA key and the encryption key itself are hard-coded to avoid
stalling the test runs because of lack of entropy, they have no special
significance otherwise.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
when restoring an encrypted key, the original one is obviously not
available to check the fingerprint with.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this fixes the issue that on some filesystems, you cannot recursively
remove a directory when you hold a lock on a file inside (e.g. nfs/cifs)
it is not really backwards compatible (so during an upgrade, there
could be two daemons have the lock), but since the locking was
broken before (see previous patch) it should not really matter
(also it seems very unlikely that someone will trigger this)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
'lock_manifest' returns a Result<File, Error> so we always got the result,
even when we did not get the lock, but we acted like we had.
bubble the locking error up
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if no groups were found, the task log was very confusing as it
contained no real information why nothing was synced, e.g.:
Starting datastore sync job 'remote:datastore:local-datastore:s-79412799-e6ee'
Sync datastore 'local-datastore' from 'remote/datastore'
sync job 'remote:datastore:local-datastore:s-79412799-e6ee' end
TASK OK
this patch simply logs how many groups were found and are about to be synced:
Starting datastore sync job 'remote:datastore:local-datastore:s-79412799-e6ee'
Sync datastore 'local-datastore' from 'remote/datastore'
found 0 groups to sync
sync job 'remote:datastore:local-datastore:s-79412799-e6ee' end
TASK OK
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
percentage of verified groups, interpolating based on snapshot count
within the group. in most cases, this will also be closer to 'real'
progress since added snapshots (those which will be verified) in active
backup groups will be roughly evenly distributed, while number of total
snapshots per group will be heavily skewed towards those groups which
have existed the longest, even though most of those old snapshots will
only be re-verified very infrequently.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
BackupInfo::list_backup_groups is identical code-wise, and makes more
sense as entry point for listing groups.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
WalkDir does not follow symlinks by default anyway, and this behaviour
is not documented anywhere. e.g., if a sysadmin mounts 'extra storage'
for some backup group or type (not knowing that only metadata is stored
in those directories), GC will ignore all the indices contained within
and happily garbage collect their chunks..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
for safety reason, GC finds and marks all index files below the
datastore base path. as a result of regular operations, only index files
within the expected scheme of <TYPE>/<ID>/<TIMESTAMP> should exist.
add a small check + warning if the index list contains index files out
side of this expected scheme, so that an admin with shell access can
investigate.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
we have messages starting the phases anyway, and limit the number of
progress updates so that context remains available at all times.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>