There is a race upon reload, where it can happen that:
1. systemd forks off /bin/kill -HUP $MAINPID
2. Current instance forks off new one and notifies systemd with the
new MAINPID.
3. systemd sets new MAINPID.
4. systemd receives SIGCHLD for the kill process (which is the current
control process for the service) and reads the PID of the old
instance from the PID file, resetting MAINPID to the PID of the old
instance.
5. Old instance exits.
6. systemd receives SIGCHLD for the old instance, reads the PID of the
old instance from the PID file once more. systemd sees that the
MAINPID matches the child PID and considers the service exited.
7. systemd receivese notification from the new PID and is confused.
The service won't get active, because the notification wasn't
handled.
To fix it, update the PID file before sending the MAINPID
notification, similar to what a comment in systemd's
src/core/service.c suggests:
> /* Forking services may occasionally move to a new PID.
> * As long as they update the PID file before exiting the old
> * PID, they're fine. */
but for our Type=notify "before sending the notification" rather than
"before exiting", because otherwise, the mix-up in 4. could still
happen (although it might not actually be problematic without the
mix-up in 6., it still seems better to avoid).
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
without that one gets a "failed to lookup datastore X" in the log for
every datastore that is in read-only or offline maintenance mode,
even if they aren't scheduled for GC anyway.
Avoid that by first opening the datastore through a Lookup operation,
and only re-open it as Write op once we know that GC needs to get
scheduled for it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
RDD update did not use lookup_datastore() and therefore bypassed
the maintenance mode checks. This adds the needed check directly.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
Allow pulling all groups from a certain source namespace, and
possibly sub namespaces until max-depth, into a target namespace.
If any sub-namespaces get pulled, they will be mapped relatively from
the source parent namespace to the target parent namespace.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
we do not have normal GET variables available in the checks provided
by the rest server from the api macro, so do it manually.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this is the most common sequence of checks we have in this file, so
let's have a single place where we implement it.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
these happen together very often.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
else this could leak existence of datastore.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the helper now takes both high-privilege and lesser-privilege privs, so
the resulting bool can be used to quickly check whether additional
checks like group ownership are needed or not.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We decided to go this route because it'll most likely be
safer in the API as we need to explicitly add namespaces
support to the various API endpoints this way.
For example, 'pull' should have 2 namespaces: local and
remote, and the GroupFilter (which would otherwise contain
exactly *one* namespace parameter) needs to be applied for
both sides (to decide what to pull from the remote, and what
to *remove* locally as cleanup).
The *datastore* types still contain the namespace and have a
`.backup_ns()` getter.
Note that the datastore's `Display` implementations are no
longer safe to use as a deserializable string.
Additionally, some datastore based methods now have been
exposed via the BackupGroup/BackupDir types to avoid a
"round trip" in code.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We probably can combine the base permission + owner check, but for
now add explicit ones to upfront so that the change is simpler as
only one thing is done.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
allow to list any namespace with privileges on it and allow to create
and delete namespaces if the user has modify permissions on the parent
namespace.
Creation is only allowed if the parent NS already exists.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
these are the ones for non-#[api] methods, also fill in the
namespace in prune operations
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Make it easier by adding an helper accepting either group or
directory
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
used_datastores returned the 'target', but in the full_restore_worker,
we interpreted it as the source and searched for a mapping
(which we then locked)
since we cannot return a HashSet of Arc<T> (missing Hash trait on DataStore),
we have now a map of source -> target
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
thanks to commit 70142e607dda43fc778f39d52dc7bb3bba088cd3 from
proxmox repos's proxmox-http crate
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
and remove already fixed fixmes.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
[ T: squash in cargo fmt fixup for some trailing ws ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
pull_store is the entrypoint used by other code, the rest does not need
to be visible at all.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
else this might remove groups which are not part of the pull scope. note
that setting/using remove_vanished already checks the required privs
earlier.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
On reload the old process hands over to the new process but needs to
keep running until all its worker tasks are finished to avoid
breaking a in-progress action like a xterm.js web shell or a backup
creation/restore.
During that wait time the receiving channel was already closed, but
the TCP sockt accept listener was still left active by mistake.
That paired with the `SO_REUSEPORT` being set on the underlying
socket, made the kernel choose either the old or new process for new
incoming connections, both still listened for them after all and
reuse-port + multiple processes is often used as load-balancer
mechanism.
As the old proxy accepted connections but didn't process them anymore
one could observer sporadic connection failures on any API call, well
any new connection to the proxy, depending on which process got the
it assigned.
The fix is to stop accepting new connections one we shutdown, so poll
the shutdown_future too during accept and just exit the accept-loop
on shutdown.
Note: This part of the code, nor other parts that could influence it,
wasn't changed at all in recent times, so it's still unresolved for
why it pops up only now.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Co-authored-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[ T: add more (root cause) info and reword a bit ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Returning the GC status was dropped by mistake in commit 762f7d15
("datastore status: factor out api type DataStoreStatusListItem")
As this is considered a breaking change which we also felt, due to
the gc-status being used in the web interface for the datastore
overview list (not the dashboard), re add it.
Fixes: 762f7d15
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ T: add reference to breaking commit, reword message ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
data blobs can only appear in a BackupDir (snapshot) in the backup
hierachy, so makes more sense that it lives in there.
As it wasn't widely used anyway it's easy to move the single
non-package call site over to the new one directly and drop the
implementation from Datastore completely.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> The hash set will be able to hold at least capacity elements
> without reallocating. If capacity is 0, the hash set will not
> allocate.
-- rustdoc, HashSet::with_capacity
So, the number we pass is the amount of chunk "IDs" we safe, which is
then 64Ki, not 16Ki and thus the size we can reference too is also
256 GiB, not 64 GiB.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
And drop the base_path parameter on a first bunch of
functions (more reordering will follow).
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
like for the index, instead of manually stripping it.
this (and the previous change) is backwards-compatible since `Remote`
already skipped serializing empty strings, so the returned JSON is
identical.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
And use the api-types for their contents.
These are supposed to be instances for a datastore, the pure
specifications are the ones in pbs_api_types which should be
preferred in crates like clients which do not need to deal
with the datastore directly.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
datastore's group_path will be moved to BackupDir soon and
this is required to be able to properly distinguish them
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>